Configuring Potassium

At this point you should have gone through Getting Started in a Few Minutes
Having a premade BERT model in prod is a good start, but you've likely got your own model(s) with different requirements to deploy. This guide shows all the ways you can do that!

Customizing your inference code

You can modify any@app.handler() logic in your app to change inference logic
For example hello-world app to return all outputs rather than just the 0th index, change the return in the handler defined in
return Response(
json = {"outputs": outputs[0]},
return Response(
json = {"outputs": outputs},
Save and rerun:
Then call the model:
curl -X POST \
-H "Content-Type: application/json" \
-d '{"prompt": "Hello I am a [MASK] model."}' http://localhost:8000/
you'll see the new handler logic run
"score": 0.13177461922168732,
"sequence": "hello i am a fashion model.",
"token": 4827,
"token_str": "fashion"
"score": 0.1120428815484047,
"sequence": "hello i am a role model.",
"token": 2535,
"token_str": "role"
"score": 0.022045975551009178,
"sequence": "hello i am a model model.",
"token": 2944,
"token_str": "model"}

Changing which model you load

Modify your @app.init function to load the model of your choice and store the model object in the context dictionary.
You access the model in any app.handler() by fetching it from the global dictionary with context.get(...) and passing in your key name.
Here's an example
from potassium import Potassium, Request, Response
from transformers import pipeline
import torch
app = Potassium("my_app")
def init():
device = 0 if torch.cuda.is_available() else -1
model = pipeline('fill-mask', model='bert-base-uncased', device=device)
context = {
"model": model,
"hello": "world"
return context
def handler(context: dict, request: Request) -> Response:
prompt = request.json.get("prompt")
model = context.get("model")
outputs = model(prompt)
return Response(
json={"outputs": outputs},
if __name__ == "__main__":

Loading multiple models into your app

Modify your @app.init function to load multiple models and store them in the context dictionary, each with different keys such as "model1" and "model2"
You can then access them in your inference handlers.

Supporting multiple inference handlers

You can make as many handlers as you'd like each with a unique path, for example
def handler(context: dict, request: Request) -> Response:
# handler code here...
def handler(context: dict, request: Request) -> Response:
# handler code here...

Handling larger payloads using third-party storage

Requests sent through Banana have a size limit of 1mb.
You can see examples of doing this in our potassium example github repo
This is because we handle many calls at once, and if calls are large in size there are networking bottlenecks that slow the service down for everyone.
To support larger payloads you'll need to use a third party service (of your choice), such as AWS S3 storage or Google Cloud Storage.
Large input payloads:
  • before calling banana upload your input to the storage party of your choice, and send the file name pointing to it to banana as input
  • in your potassium app download the file and run it through your model
Large output payloads:
  • before returning from banana upload to the third party storage
  • return a link to this file
  • in your call site download the resulting file

Supporting >5minute inferences with background tasks

By default tasks timeout after 5 minutes. You can extend this by creating a handler function with the @app.background()decorator to run the handler as a nonblocking job in the background
# Import the send_webhook helper from Potassium
from potassium import send_webhook
def handler(context: dict, request: Request) -> Response:
prompt = request.json.get("prompt")
model = context.get("model")
outputs = model(prompt)
# Make sure to change the webhook URL to something
# that can receive the POST JSON payload
send_webhook(url="http://localhost:8001", json={"outputs": outputs})
Since Potassium won't return the inference response of a background task, use the send_webhook() helper function for POSTing data onward to a url, or you may add your own custom upload/pipeline code.
When invoked, the server immediately returns a {"success": true} message and your task will continue running in the background until completion.

Migrating an existing non-potassium app to Potassium

Recommended way is to refactor your existing app into a couple functions such as "load" and "run". Then make a brand new potassium project and import your existing load/run functions like you would a library and call them from the potassium init and inference handlers
This is usually much easier than wiring in potassium to your existing apps code