Links

Potassium API

Potassium is an open source web framework, built to tackle the unique challenges of serving custom models in production.
The goal of this project is to:
  • Provide a familiar web framework similar to Flask/FastAPI
  • Bake in best practices for handling large, GPU-bound ML models
  • Provide a set of primitives common in ML serving, such as:
    • POST request handlers
    • Background handlers w/ webhooks for async jobs
  • Maintain a standard interface, to allow the code and models to compile to specialized hardware (ideally on Banana Serverless GPUs 😉)

Stability Notes:

Potassium uses Semantic Versioning, in that major versions imply breaking changes, and v0 implies instability even between minor/patch versions. Be sure to lock your versions, as we're still in v0!

Documentation

potassium.Potassium

from potassium import Potassium
app = Potassium("server")
This instantiates your HTTP app, similar to popular frameworks like Flask
To run on Banana, this Potassium app must:
  • Use the variable name app in the global scope
  • Launch from the app.py file as the main entrypoint

@app.init

@app.init
def init():
device = 0 if torch.cuda.is_available() else -1
model = pipeline('fill-mask', model='bert-base-uncased', device=device)
return {
"model": model
}
The @app.init decorated function runs once on server startup, and is used to load any reuseable, heavy objects such as:
  • Your AI model, loaded to GPU
  • Tokenizers
  • Precalculated embeddings
The return value is a dictionary which saves to the app's context, and is used later in the handler functions.
There may only be one @app.init function.

@app.handler(route="/your_api_route")

@app.handler(route="/your_api_route")
def handler(context: dict, request: Request) -> Response:
prompt = request.json.get("prompt")
model = context.get("model")
outputs = model(prompt)
return Response(
json = {"outputs": outputs},
status=200
)
The @app.handler decorated function runs for every http call to the handlers corresponding route, and is used to run inference against your model(s).
You may configure as many @app.handler functions as you'd like, with unique API routes.

@app.background(route="/your_api_route")

@app.background(route="/your_api_route")
def handler(context: dict, request: Request) -> Response:
prompt = request.json.get("prompt")
model = context.get("model")
outputs = model(prompt)
send_webhook(url="http://localhost:8001", json={"outputs": outputs})
return
The @app.background() decorated function runs a nonblocking job in the background, for tasks where results aren't expected to return clientside. It's on you to forward the data to wherever you please. Potassium supplies a send_webhook() helper function for POSTing data onward to a url, or you may add your own custom upload/pipeline code.
When invoked, the server immediately returns a {"success": true} message.
You may configure as many @app.background functions as you'd like, with unique API routes.

app.serve()

app.serve runs the server, and is a blocking operation.

Store

Potassium includes a key-value storage primative, to help users persist data between calls.
Example usage: We encourage you to use Redis as your backend
from potassium import Potassium
from potassium.store import Store, RedisConfig
from transformers import pipeline
import torch
store = Store(
backend="redis",
config = RedisConfig(
host = "localhost",
port = 6379
)
)
@app.init
def init():
device = 0 if torch.cuda.is_available() else -1
model = pipeline('fill-mask', model='bert-base-uncased', device=device)
return {
"model": model
}
@app.handler(route="/some_handler")
def handler(context: dict, request: Request) -> Response:
# handler code...
store.set("key", "value", ttl=60)
@app.handler(route="/another_handler")
def handler(context: dict, request: Request) -> Response:
# handler code...
value = store.get("key")