In this page, we'll get familiar with Potassium
Opening up, you'll see
from potassium import Potassium, Request, Response
from transformers import pipeline
import torch
app = Potassium("my_app")
# @app.init runs at startup, and loads models into the app's context
def init():
device = 0 if torch.cuda.is_available() else -1
model = pipeline('fill-mask', model='bert-base-uncased', device=device)
context = {
"model": model
return context
# @app.handler runs for every call
def handler(context: dict, request: Request) -> Response:
prompt = request.json.get("prompt")
model = context.get("model")
outputs = model(prompt)
return Response(
json = {"outputs": outputs[0]},
if __name__ == "__main__":
Each Potassium app has two vital components:
@app.init runs on startup and loads any heavy objects, such as models, into memory. The return value saves as the app's context, for use later.
@app.handler() is the HTTP POST request handler, ran on every call. In this example, it uses the preloaded models from the context and the prompt from the input json to run inference and return the output.
Model loads from disk to GPU, depending on the model, can take many minutes! For this reason, we load them in advance of the handlers, so they're hot and ready to go.
In the next section, we'll customize our Potassium app.