Configuring Project Settings

These are some of the project settings that may require additional understanding and that aren't covered in other places on this wiki.

Idle Timeout

When your project is finished running inference the replica will remain in standby waiting for new calls for some duration specified in your project settings as Idle Timeout.

Inference Timeout

This is the maximum time, in seconds, that an inference call can run for. Lower this to lower total cost of individual calls, at the risk of the call being aborted if it needs to run for longer.

By default calls will run for up to 5 minutes. If you require more than 5 minutes you'll need to configure a background handler which you can refer to in the Potassium API

With a background handler, you can have up to 15 minute inferences on Banana. If you require more please reach out to us by following How to Get Support

Max Replicas

This is how many replicas will scale up at peak traffic. For example if you set max replicas == 1 you'll never have more than 1 replicas running. Any calls that come in that can't find a replica will return an error indicating no GPU is available.

You can potentially save a lot of $ with this setting. For example if your cold boots are 20 seconds but your inference is 1 second and you call 10 times at once you'll get 10 replicas * 20 second cold boot + 10 * 1 second inferences = 210 seconds. Alternatively you call your project sequentially and set max replicas = 1 you'll get 1*20s cold boot and then 9 more 1s calls, totaling == 29 seconds. Almost 10x cheaper!

Last updated