Configuring Potassium
Having a premade BERT model in prod is a good start, but you've likely got your own model(s) with different requirements to deploy. This guide shows all the ways you can do that!
You can modify any
@app.handler()
logic in your app to change inference logicFor example hello-world app to return all outputs rather than just the 0th index, change the return in the handler defined in app.py
from:
return Response(
json = {"outputs": outputs[0]},
status=200
)
to:
return Response(
json = {"outputs": outputs},
status=200
)
Save
app.py
and rerun:python3 app.py
Then call the model:
curl -X POST \
-H "Content-Type: application/json" \
-d '{"prompt": "Hello I am a [MASK] model."}' http://localhost:8000/
you'll see the new handler logic run
{"outputs":[
{
"score": 0.13177461922168732,
"sequence": "hello i am a fashion model.",
"token": 4827,
"token_str": "fashion"
},
{
"score": 0.1120428815484047,
"sequence": "hello i am a role model.",
"token": 2535,
"token_str": "role"
},
...
{
"score": 0.022045975551009178,
"sequence": "hello i am a model model.",
"token": 2944,
"token_str": "model"}
]}
Modify your
@app.init
function to load the model of your choice and store the model object in the context dictionary. You access the model in any app.handler() by fetching it from the global dictionary with context.get(...) and passing in your key name.
Here's an example
from potassium import Potassium, Request, Response
from transformers import pipeline
import torch
app = Potassium("my_app")
@app.init
def init():
device = 0 if torch.cuda.is_available() else -1
model = pipeline('fill-mask', model='bert-base-uncased', device=device)
context = {
"model": model,
"hello": "world"
}
return context
@app.handler()
def handler(context: dict, request: Request) -> Response:
prompt = request.json.get("prompt")
model = context.get("model")
outputs = model(prompt)
return Response(
json={"outputs": outputs},
status=200
)
if __name__ == "__main__":
app.serve()
Modify your
@app.init
function to load multiple models and store them in the context dictionary, each with different keys such as "model1" and "model2"You can then access them in your inference handlers.
You can make as many handlers as you'd like each with a unique path, for example
@app.handler("/some_path")
def handler(context: dict, request: Request) -> Response:
# handler code here...
@app.handler("/another_path")
def handler(context: dict, request: Request) -> Response:
# handler code here...
Requests sent through Banana have a size limit of 1mb.
This is because we handle many calls at once, and if calls are large in size there are networking bottlenecks that slow the service down for everyone.
To support larger payloads you'll need to use a third party service (of your choice), such as AWS S3 storage or Google Cloud Storage.
Large input payloads:
- before calling banana upload your input to the storage party of your choice, and send the file name pointing to it to banana as input
- in your potassium app download the file and run it through your model
Large output payloads:
- before returning from banana upload to the third party storage
- return a link to this file
- in your call site download the resulting file
By default tasks timeout after 5 minutes. You can extend this by creating a handler function with the
@app.background()
decorator to run the handler as a nonblocking job in the background# Import the send_webhook helper from Potassium
from potassium import send_webhook
@app.background("/background")
def handler(context: dict, request: Request) -> Response:
prompt = request.json.get("prompt")
model = context.get("model")
outputs = model(prompt)
# Make sure to change the webhook URL to something
# that can receive the POST JSON payload
send_webhook(url="http://localhost:8001", json={"outputs": outputs})
return
Since Potassium won't return the inference response of a background task, use the
send_webhook()
helper function for POSTing data onward to a url, or you may add your own custom upload/pipeline code.When invoked, the server immediately returns a
{"success": true}
message and your task will continue running in the background until completion.Recommended way is to refactor your existing app into a couple functions such as "load" and "run". Then make a brand new potassium project and import your existing load/run functions like you would a library and call them from the potassium init and inference handlers
This is usually much easier than wiring in potassium to your existing apps code
Last modified 29d ago