Potassium API
Potassium is an open source web framework, built to tackle the unique challenges of serving custom models in production.
The goal of this project is to:
- Provide a familiar web framework similar to Flask/FastAPI
- Bake in best practices for handling large, GPU-bound ML models
- Provide a set of primitives common in ML serving, such as:
- POST request handlers
- Background handlers w/ webhooks for async jobs
- Maintain a standard interface, to allow the code and models to compile to specialized hardware (ideally on Banana Serverless GPUs 😉)
Potassium uses Semantic Versioning, in that major versions imply breaking changes, and v0 implies instability even between minor/patch versions. Be sure to lock your versions, as we're still in v0!
from potassium import Potassium
app = Potassium("server")
To run on Banana, this Potassium app must:
- Use the variable name
app
in the global scope - Launch from the
app.py
file as the main entrypoint
@app.init
def init():
device = 0 if torch.cuda.is_available() else -1
model = pipeline('fill-mask', model='bert-base-uncased', device=device)
return {
"model": model
}
The
@app.init
decorated function runs once on server startup, and is used to load any reuseable, heavy objects such as:- Your AI model, loaded to GPU
- Tokenizers
- Precalculated embeddings
The return value is a dictionary which saves to the app's
context
, and is used later in the handler functions. There may only be one
@app.init
function.@app.handler(route="/your_api_route")
def handler(context: dict, request: Request) -> Response:
prompt = request.json.get("prompt")
model = context.get("model")
outputs = model(prompt)
return Response(
json = {"outputs": outputs},
status=200
)
The
@app.handler
decorated function runs for every http call to the handlers corresponding route, and is used to run inference against your model(s).You may configure as many
@app.handler
functions as you'd like, with unique API routes.@app.background(route="/your_api_route")
def handler(context: dict, request: Request) -> Response:
prompt = request.json.get("prompt")
model = context.get("model")
outputs = model(prompt)
send_webhook(url="http://localhost:8001", json={"outputs": outputs})
return
The
@app.background()
decorated function runs a nonblocking job in the background, for tasks where results aren't expected to return clientside. It's on you to forward the data to wherever you please. Potassium supplies a send_webhook()
helper function for POSTing data onward to a url, or you may add your own custom upload/pipeline code.When invoked, the server immediately returns a
{"success": true}
message.You may configure as many
@app.background
functions as you'd like, with unique API routes.app.serve
runs the server, and is a blocking operation.Potassium includes a key-value storage primative, to help users persist data between calls.
Example usage: We encourage you to use Redis as your backend
from potassium import Potassium
from potassium.store import Store, RedisConfig
from transformers import pipeline
import torch
store = Store(
backend="redis",
config = RedisConfig(
host = "localhost",
port = 6379
)
)
@app.init
def init():
device = 0 if torch.cuda.is_available() else -1
model = pipeline('fill-mask', model='bert-base-uncased', device=device)
return {
"model": model
}
@app.handler(route="/some_handler")
def handler(context: dict, request: Request) -> Response:
# handler code...
store.set("key", "value", ttl=60)
@app.handler(route="/another_handler")
def handler(context: dict, request: Request) -> Response:
# handler code...
value = store.get("key")
Last modified 2mo ago