Links

Serverless Framework

The code you run on GPUs
The Banana Serverless Framework is an HTTP server for inference, built to autoscale on serverless GPUs. It's structured to allow build-time optimizations to increase inference speed and reduce cold boots.
The framework is built and deployed using Docker.
Pro Tip: You can deploy any Docker workload onto Banana GPUs. Follow this guide to go beyond the template and build your own Banana-compatible repo from scratch.

Hello World

The frameworks come prebuilt for a Huggingface BERT example, as your first Hello World. You can find the source code here on GitHub.
  1. 1.
    ​Click here to create a public or private fork from the GitHub template repo.
  2. 2.
    Click "+ New MODEL" in the banana app, select your new repo, and deploy.
  3. 3.
    Patiently wait as it builds and deploys.
Next, can then call your BERT hello world model from the SDKs, using the below json as model_inputs:
{
'prompt': 'Hello World! I am a [MASK] machine learning model.'
}
And the SDK will return outputs such as:
{
// some metadata
...
// some metadata
"modelOutputs": [
[
{
"score": 0.0529061034321785,
"token": 3722,
"token_str": "simple",
"sequence": "hello world! i am a simple machine learning model."
},
{
"score": 0.050797536969184875,
"token": 3143,
"token_str": "complete",
"sequence": "hello world! i am a complete machine learning model."
},
]
]
}

Adding Custom Code

​Click here to create a public or private fork from a GitHub template repo, which you may customize to your heart's content.
​app.py is the most important file in the framework. It is where you code the python logic that is ran for every inference.

Other Important Files in the Framework:

​

Development and Testing

The template repo has builtin tools to test locally, so be sure to do so before deploying!
We highly recommend developing on a cloud GPU for the tightest feedback loop, if you can afford it. We suggest:
  • ​Brev - easy cloud GPUs for dev, preconfigured to work with Banana. Click here to use their Banana starter environment.
  • ​AWS/GCP GPU instances w/ deep learning base images + VS Code's SSH Editor extension
If you cannot afford GPUs, we totally understand. In that case, either:
  • write your code to run on CPUs if GPUs are not available and follow the testing steps below
  • or simply yeet untested code into the build pipeline and watch build/runtime logs in Banana for errors

To test:

1) Install package requirements with:
pip3 install -r requirements.txt
2) Modify and run download.py to get your model weights locally, if needed:
python3 download.py
3) Make your edits to app.py
4) Run the Banana development server with:
python3 server.py
5) Modify and run test.py with your expected json payload as model_inputs
python3 test.py
6) Repeat steps 2-5 until ready to deploy
7) Verify that the application builds into Docker, with:
docker build -t banana .
8) Verify that the application runs in Docker on the GPUs, with:
docker run --gpus=all -p 8000:8000 banana
If all of this works, push to the main branch on GitHub to start your build on Banana!