Links

Build System

Your inference servers are deployed by pushing to main on GitHub.
Banana has a GitHub integration that watches for pushes and automatically builds and deploys that code.
Read more about our GitHub Integration.

Optimization Step

During build, Banana will attempt to recompile your source code for the server to boot faster and for the inferences to run faster.
Optimization is not required, but it is recommended.
In the case that optimization fails, your original source code will still be deployed. You can find optimization failed status in the Build Logs.
​

Required for Optimization:

The Serverless Framework follows a strict but simple format to get optimizations.
While adding your custom code, adhere to the following criteria:
  • app.py at the root of the repo
  • a function called init() in app.py
  • logic in the init() function to load your ML model into memory as the model variable
    • model must be a global variable
    • model must be a pytorch-based model (this includes most Huggingface models)
If your model can be found as the model object in the init() function of app.py, the optimizer will be able to find it and recompile it for faster coldboots and faster inference.
Hint: if you have multiple models, you will need to create unique repos per model to get optimizations for all of them. Putting multiple into a single deployment will work, but only the model named "model" will get optimized.