How to Serve Anything on Banana

Getting Started

Banana is flexible for anything you’d like to do on serverless GPUs.

We historically focus on hosting Python code for ML inference, but nothing’s stopping you from doing any GPU workload on Banana, such as FFMPEG, rendering, or anything you please.

All you need to do is set up a repo that fits this interface:

Basic Server Interface

You need:

  • A Dockerfile at the root of the repository

  • CUDA drivers installed into that dockerfile, such that docker run --gpus=all your_image correctly uses the GPU

  • An HTTP server

    • Written in any language

    • With two handlers:

      • a GET handler at “/healthcheck” which returns any 200 status code

      • a POST handler at “/”

        • it intakes json (the model_inputs value sent in by banana.run())

        • it returns json in the response Body (not as an attachment)

    • Ran on port 8000

    • A CMD line in the Dockerfile which runs your HTTP server on image start

We recommend:

  • Initializing any large objects (models, for example) as global variables before the http server starts, so that you don’t do hefty loads for every call

  • Configuring the http server to be single worker. CUDA tends to throw errors when you try to multiprocess on shared global variables. Our routing will only ever send one call at a time to each replica anyway, so best to just make the server single-threaded

You don’t need:

  • A specific language

Last updated