Banana bills for the quantity of time that your inference servers are running on the GPUs. This includes:

Model Load

Often called the Cold Boot, this is the time it takes to load the model from the disk to running in the GPU RAM.

Inference Time

The time for which the server is live and handling calls from the queue.

Idle Timeout

The time a server remains idle before shutting down.


Imagine you have an image generation model you want to host


  • Your model takes 10 seconds to generate an image

  • We can cold boot your model in 5 seconds

Scenario 1: You call your model 1 time

  • Cost = 5 (cold-boot) + 10 (inference) + 10 (timeout) = 25 seconds of GPU time

Scenario 2: You call your model 100 times back to back

  • Cost = 5 (cold-boot) + 10 (inference)*100 + 10 (timeout) = 1015 seconds of GPU time

Scenario 3: You call your model multiple times concurrently

  • Cost = You pay 5-seconds of cold-boot per replica, 10 seconds timeout per replica, and all inferences. The replication depends on your autoscaling settings.

As a general rule, you can make Banana faster by paying more to keep machines running, or you can make it more economical by tolerating longer wait times for calls. Please also see Configuring Project Settings for some ways to optimize costs.

Last updated