Configuring Potassium

At this point you should have gone through Getting Started in a Few Minutes

Having a premade BERT project in prod is a good start, but you've likely got your own project(s) with different requirements to deploy. This guide shows all the ways you can do that!

Customizing your inference code

You can modify any@app.handler() logic in your app to change inference logic

For example hello-world app to return all outputs rather than just the 0th index, change the return in the handler defined in app.py

from:

return Response(
    json = {"outputs": outputs[0]}, 
    status=200
)

to:

return Response(
    json = {"outputs": outputs}, 
    status=200
)

Save app.py and rerun:

python3 app.py

Then call the project:

curl -X POST \
    -H "Content-Type: application/json" \
    -d '{"prompt": "Hello I am a [MASK] model."}' http://localhost:8000/

you'll see the new handler logic run

{"outputs":[
    {
        "score": 0.13177461922168732,
        "sequence": "hello i am a fashion model.",
        "token": 4827,
        "token_str": "fashion"
    },
    {
        "score": 0.1120428815484047,
        "sequence": "hello i am a role model.",
        "token": 2535, 
        "token_str": "role"
    },
    ...
    {
        "score": 0.022045975551009178,
        "sequence": "hello i am a model model.",
        "token": 2944,
        "token_str": "model"}
]}

Changing which model you load

Modify your @app.init function to load the model of your choice and store the model object in the context dictionary.

You access the model in any app.handler() by fetching it from the global dictionary with context.get(...) and passing in your key name.

Here's an example

from potassium import Potassium, Request, Response
from transformers import pipeline
import torch

app = Potassium("my_app")


@app.init
def init():
    device = 0 if torch.cuda.is_available() else -1
    model = pipeline('fill-mask', model='bert-base-uncased', device=device)

    context = {
        "model": model,
        "hello": "world"
    }

    return context


@app.handler()
def handler(context: dict, request: Request) -> Response:
    prompt = request.json.get("prompt")
    model = context.get("model")
    outputs = model(prompt)

    return Response(
        json={"outputs": outputs},
        status=200
    )


if __name__ == "__main__":
    app.serve()

Loading multiple models into your app

Modify your @app.init function to load multiple models and store them in the context dictionary, each with different keys such as "model1" and "model2"

You can then access them in your inference handlers.

Supporting multiple inference handlers

You can make as many handlers as you'd like each with a unique path, for example

@app.handler("/some_path")
def handler(context: dict, request: Request) -> Response:
    # handler code here...

@app.handler("/another_path")
def handler(context: dict, request: Request) -> Response:
    # handler code here...

Handling payloads using third-party storage

Sometimes it's more convenient to handle payloads through a third-party storage. For example, if calls are large in size you might hit networking bottlenecks and find it faster to load the actual data from a storage bucket like S3.

You can see examples of doing this in our potassium example github repo

Large input payloads:

  • before calling banana upload your input to the storage party of your choice, and send the file name pointing to it to banana as input

  • in your potassium app download the file and run it through your model

Large output payloads:

  • before returning from banana upload to the third party storage

  • return a link to this file

  • in your call site download the resulting file

Supporting long running inferences with background tasks

You can create a handler function with the @app.background()decorator to run the handler as a nonblocking job in the background

# Import the send_webhook helper from Potassium
from potassium import send_webhook

@app.background("/background")
def handler(context: dict, request: Request) -> Response:

    prompt = request.json.get("prompt")
    model = context.get("model")
    outputs = model(prompt)

    # Make sure to change the webhook URL to something 
    # that can receive the POST JSON payload
    send_webhook(url="http://localhost:8001", json={"outputs": outputs})

    return

Since Potassium won't return the inference response of a background task, use the send_webhook() helper function for POSTing data onward to a url, or you may add your own custom upload/pipeline code.

When invoked, the server immediately returns a {"success": true} message and your task will continue running in the background until completion.

Migrating an existing non-potassium app to Potassium

Recommended way is to refactor your existing app into a couple functions such as "load" and "run". Then make a brand new potassium project and import your existing load/run functions like you would a library and call them from the potassium init and inference handlers

This is usually much easier than wiring in potassium to your existing apps code

Last updated