Potassium API

Potassium is an open source web framework, built to tackle the unique challenges of serving custom models in production.

The goal of this project is to:

Provide a familiar web framework similar to Flask/FastAPI
Bake in best practices for handling large, GPU-bound ML models
Provide a set of primitives common in ML serving, such as:
- POST request handlers
- Background handlers w/ webhooks for async jobs
Maintain a standard interface, to allow the code and models to compile to specialized hardware (ideally on Banana Serverless GPUs 😉)

Stability Notes:

Potassium uses Semantic Versioning, in that major versions imply breaking changes, and v0 implies instability even between minor/patch versions. Be sure to lock your versions, as we're still in v0!

Documentation

potassium.Potassium

from potassium import Potassium

app = Potassium("server")

This instantiates your HTTP app, similar to popular frameworks like Flask

To run on Banana, this Potassium app must:

Use the variable name app in the global scope
Launch from the app.py file as the main entrypoint

@app.init

@app.init
def init():
    device = 0 if torch.cuda.is_available() else -1
    model = pipeline('fill-mask', model='bert-base-uncased', device=device)

    return {
        "model": model
    }

The @app.init decorated function runs once on server startup, and is used to load any reuseable, heavy objects such as:

Your AI model, loaded to GPU
Tokenizers
Precalculated embeddings

The return value is a dictionary which saves to the app's context, and is used later in the handler functions.

There may only be one @app.init function.

@app.handler(route="/your_api_route")

@app.handler(route="/your_api_route")
def handler(context: dict, request: Request) -> Response:
    
    prompt = request.json.get("prompt")
    model = context.get("model")
    outputs = model(prompt)

    return Response(
        json = {"outputs": outputs}, 
        status=200
    )

The @app.handler decorated function runs for every http call to the handlers corresponding route, and is used to run inference against your model(s).

You may configure as many @app.handler functions as you'd like, with unique API routes.

@app.background(route="/your_api_route")

@app.background(route="/your_api_route")
def handler(context: dict, request: Request) -> Response:

    prompt = request.json.get("prompt")
    model = context.get("model")
    outputs = model(prompt)

    send_webhook(url="http://localhost:8001", json={"outputs": outputs})

    return

The @app.background() decorated function runs a nonblocking job in the background, for tasks where results aren't expected to return clientside. It's on you to forward the data to wherever you please. Potassium supplies a send_webhook() helper function for POSTing data onward to a url, or you may add your own custom upload/pipeline code.

When invoked, the server immediately returns a {"success": true} message.

You may configure as many @app.background functions as you'd like, with unique API routes.

app.serve()

app.serve runs the server, and is a blocking operation.

Pre-warming your app

As of version >= 0.3.0, Potassium comes with a built-in endpoint for those cases where you want to "warm up" your app to better control the timing of your inference calls. You don't need to call it, since your inference call requires init() to have run once on server startup anyway, but this gives you a bit more control.

Once your model is warm (i.e., cold boot finished), this endpoint returns a 200. If a cold boot is required, the init() function is first called while the server starts up, and then a 200 is returned from this endpoint.

You don't need any extra code to enable it, it comes out of the box and you can call it at /_k/warmup as either a GET or POST request.

This is also available in our SDKs as a .warmup() function on the client instance.

Store

Potassium includes a key-value storage primative, to help users persist data between calls.

Example usage: We encourage you to use Redis as your backend

from potassium import Potassium
from potassium.store import Store, RedisConfig
from transformers import pipeline
import torch

store = Store(
    backend="redis",
    config = RedisConfig(
        host = "localhost",
        port = 6379
    )
)

@app.init
def init():
    device = 0 if torch.cuda.is_available() else -1
    model = pipeline('fill-mask', model='bert-base-uncased', device=device)

    return {
        "model": model
    }

@app.handler(route="/some_handler")
def handler(context: dict, request: Request) -> Response:
    # handler code...
    store.set("key", "value", ttl=60)
    
@app.handler(route="/another_handler")
def handler(context: dict, request: Request) -> Response:
    # handler code...
    value = store.get("key")

PreviousPotassium - your server NextConfiguring Potassium

Last updated 9 months ago