Build System
Your inference servers are deployed by pushing to main on GitHub.
Banana has a GitHub integration that watches for pushes and automatically builds and deploys that code.
During build, Banana will attempt to recompile your source code for the server to boot faster and for the inferences to run faster.
Optimization is not required, but it is recommended.
In the case that optimization fails, your original source code will still be deployed. You can find optimization failed status in the Build Logs.
While adding your custom code, adhere to the following criteria:
app.py
at the root of the repo- a function called
init()
inapp.py
- logic in the
init()
function to load your ML model into memory as themodel
variablemodel
must be a global variablemodel
must be a pytorch-based model (this includes most Huggingface models)
If your model can be found as the
model
object in the init()
function of app.py
, the optimizer will be able to find it and recompile it for faster coldboots and faster inference.Hint: if you have multiple models, you will need to create unique repos per model to get optimizations for all of them.
Putting multiple into a single deployment will work, but only the model named "model" will get optimized.
Last modified 5d ago