Billing
Banana bills for the quantity of time that your inference servers are running on the GPUs. This includes:
Imagine you have an image generation model you want to host
Assume:
- Your model takes 10 seconds to generate an image
- We can cold boot your model in 5 seconds
Scenario 1: You call your model 1 time
- Cost = 5 (cold-boot) + 10 (inference) + 10 (timeout) = 25 seconds of GPU time
Scenario 2: You call your model 100 times back to back
- Cost = 5 (cold-boot) + 10 (inference)*100 + 10 (timeout) = 1015 seconds of GPU time
Scenario 3: You call your model multiple times concurrently
- Cost = You pay 5-seconds of cold-boot per replica, 10 seconds timeout per replica, and all inferences. The replication depends on your autoscaling settings.
As a general rule, you can make Banana faster by paying more to keep machines running, or you can make it more economical by tolerating longer wait times for calls. Please also see Configuring Model Settings for some ways to optimize costs.
Last modified 2mo ago