Model Changes

Let's get to it 🙌 One detail to get out of the way is that you don't need to per se migrate your current model. Instead we recommend creating a new model, that's a V2 version of your V1 model. So the migration workflow will be something like:

Create new GitHub repo => from this create a new model => migrate that model => redirect traffic from V1 model to V2 => confirm that everything works => delete old model

This way you'll never have any downtime and you can revert to the V1 model if needed 👍

Potassium

Potassium is our server framework that you'll use to deploy your models to V2 with. This is probably the biggest change for you when migrating so let's go through things thoroughly. We'll use the BERT serverless template as an example model from V1 and migrate it.

Let's migrate the BERT serverless template

The easiest way to get a Potassium app scaffolded is using the Banana CLI. Get it on your machine by running:

pip3 install banana-cli

You don't need the CLI, but it's a nice convenience. Once you got it you can initialise a Potassium app by running:

banana init bert-v2    # change "bert-v2" to the name you like
cd bert-v2             # banana init creates the repo, so cd into it

If you get an error such as "command not found: banana ", you probably haven't added the Python package bin location to your PATH variable. If that's the case, on Mac try something like: export PATH="/Users/<me>/Library/Python/3.9/bin:$PATH"

Once you cd into the directory you'll have the scaffold for Potassium read to go 🔥 If you're doing this manually, you'll need an app.py and a Dockerfile as a minimum requirement. But we highly recommend to use the download.py utility, requirements.txt and .gitignore as well.

📦 bert-v2/
├ 🚀 app.py         # this is now the server + the app
├ ⬇️ download.py    # utility file for downloading model weights
├ 🐋 Dockerfile     # to containerize everything
├  ...
├  ...              # additional stuff like .gitignore, requirements and so on

Now, activate your virtual environment, e.g.

. ./venv/bin/activate

And run your Potassium app locally:

python3 app.py

To make sure that it works you can simply call your server from another terminal by running:

curl -X POST \
-H "Content-Type: application/json" \
-d '{"prompt": "Software developers start with a Hello, [MASK]! script."}' \
http://localhost:8000/

Now you're running a Potassium app locally, awesome 💪

What's in the app.py

Okay so this is maybe a bit cheating. The default scaffold is a BERT model after all. So what to you need to change in your app to use to the Potassium framework? Let's look at the app.py to understand what's going on.

The default app.py looks like this:

from potassium import Potassium, Request, Response
from transformers import pipeline
import torch
import time

app = Potassium("my_app")

# @app.init runs at startup, and initializes the app's context
@app.init
def init():
    device = 0 if torch.cuda.is_available() else -1
    model = pipeline('fill-mask', model='bert-base-uncased', device=device)
   
    context = {
        "model": model,
    }

    return context

# @app.handler is an http post handler running for every call
@app.handler()
def handler(context: dict, request: Request) -> Response:
    
    prompt = request.json.get("prompt")
    model = context.get("model")
    outputs = model(prompt)

    return Response(
        json = {"outputs": outputs}, 
        status=200
    )

if __name__ == "__main__":
    app.serve()

If you're familiar with the Serverless Template, this should look familiar to you. To refresh your memory, this is the app.py in the Serverless Template:

from transformers import pipeline
import torch

# Init is ran on server startup
# Load your model to GPU as a global variable here using the variable name "model"
def init():
    global model
    
    device = 0 if torch.cuda.is_available() else -1
    model = pipeline('fill-mask', model='bert-base-uncased', device=device)

# Inference is ran for every server call
# Reference your preloaded global model variable here.
def inference(model_inputs:dict) -> dict:
    global model

    # Parse out your arguments
    prompt = model_inputs.get('prompt', None)
    if prompt == None:
        return {'message': "No prompt provided"}
    
    # Run the model
    result = model(prompt)

    # Return the results as a dictionary
    return result

Key takeaways:

  • init() in both cases runs when the server starts. But in Potassium we don't need to make the model a global object. We can pass it to the context which is accessible in every handler

  • inference() --> @app.handler(). Instead of an inference function you can put the decorator over your function which makes it an http-endpoint.

    • This also gives you access to the context, Request & Response objects.

Pro tip when developing a Potassium app: If it runs locally -- it should run remotely

Alright, now we have a Potassium app, let's deploy it! 🚀

The final step, deploy

Now for the simple stuff. To deploy the model you'll need to:

  1. Create a model/repo on GitHub (this should be familiar from before)

  2. Push local changes to the main branch on GitHub

  3. Go to the model settings (image below) --> change to V2

  4. This will trigger a rebuild, let's wait for 5-15 minutes while it rebuilds. Grab some coffee ☕

  5. Build is done 🚀

Now let's move to the next section on how to call the model -->

Last updated