How to Use Nano Banana via API (2026)
Complete API integration guide for Nano Banana AI inference
Start Building with Hypereal
Access Kling, Flux, Sora, Veo & more through a single API. Free credits to start, scale to millions.
No credit card required • 100k+ developers • Enterprise ready
How to Use Nano Banana via API in 2026
Banana (now commonly referred to as Nano Banana in its latest iteration) is a serverless GPU inference platform that lets developers deploy and run machine learning models via a simple API. If you need fast, scalable inference for image generation, language models, or custom ML models without managing GPU infrastructure, Banana provides a straightforward solution.
This guide covers everything you need to know to integrate Nano Banana into your application, with working code examples in Python and JavaScript.
What Is Nano Banana?
Nano Banana is the lightweight, cost-optimized tier of the Banana inference platform. It focuses on:
- Cold start optimization: Models spin up faster with pre-warmed containers
- Pay-per-second billing: You only pay for actual compute time
- Simple REST API: Standard HTTP requests, no SDKs required
- Custom model support: Deploy any model from Hugging Face or your own checkpoints
Prerequisites
Before getting started, you need:
- A Banana account at banana.dev
- An API key from your Banana dashboard
- A model ID (either a pre-built model or one you have deployed)
- Python 3.8+ or Node.js 18+ for the code examples
Step 1: Get Your API Credentials
Sign up at banana.dev and navigate to the API Keys section in your dashboard. You will need:
- API Key: Your authentication token
- Model Key: The unique identifier for the model you want to call
Step 2: Make Your First API Call (Python)
Install the Banana Python SDK:
pip install banana-dev
Here is a basic example calling a text-to-image model:
import banana_dev as banana
api_key = "your-api-key"
model_key = "your-model-key"
payload = {
"prompt": "a futuristic city at sunset, cyberpunk style, 4k",
"num_inference_steps": 30,
"guidance_scale": 7.5,
"width": 1024,
"height": 1024
}
result = banana.run(api_key, model_key, payload)
print(result["modelOutputs"])
Step 3: Make Your First API Call (JavaScript)
Using the REST API directly with fetch:
const API_KEY = "your-api-key";
const MODEL_KEY = "your-model-key";
async function runInference() {
const response = await fetch("https://api.banana.dev/start/v4/", {
method: "POST",
headers: {
"Content-Type": "application/json",
},
body: JSON.stringify({
apiKey: API_KEY,
modelKey: MODEL_KEY,
modelInputs: {
prompt: "a futuristic city at sunset, cyberpunk style, 4k",
num_inference_steps: 30,
guidance_scale: 7.5,
width: 1024,
height: 1024,
},
}),
});
const data = await response.json();
console.log(data);
}
runInference();
Step 4: Handle Async Operations
For longer-running tasks, Banana uses an asynchronous pattern. You start a job and then poll for results:
import banana_dev as banana
import time
api_key = "your-api-key"
model_key = "your-model-key"
# Start the job
payload = {
"prompt": "a detailed portrait of a cyberpunk warrior",
"num_inference_steps": 50,
}
# Start async job
start_result = banana.start(api_key, model_key, payload)
call_id = start_result["callID"]
print(f"Job started with callID: {call_id}")
# Poll for completion
while True:
check_result = banana.check(api_key, call_id)
if check_result["message"] == "success":
print("Result:", check_result["modelOutputs"])
break
elif check_result["message"] == "error":
print("Error:", check_result)
break
print("Still processing...")
time.sleep(2)
Step 5: Deploy a Custom Model
You can deploy your own models to Banana. Create a Dockerfile and an app.py:
# app.py - Banana model server
from potassium import Potassium, Request, Response
from transformers import pipeline
app = Potassium("my-model")
@app.init
def init():
model = pipeline("text-generation", model="meta-llama/Llama-3-8b")
context = {"model": model}
return context
@app.handler()
def handler(context: dict, request: Request) -> Response:
model = context["model"]
prompt = request.json.get("prompt", "")
max_tokens = request.json.get("max_tokens", 100)
result = model(prompt, max_new_tokens=max_tokens)
return Response(json={"output": result[0]["generated_text"]}, status=200)
if __name__ == "__main__":
app.serve()
The Dockerfile:
FROM pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime
WORKDIR /app
COPY . .
RUN pip install potassium transformers torch
EXPOSE 8000
CMD ["python", "app.py"]
Deploy with the Banana CLI:
banana deploy
API Endpoints Reference
| Endpoint | Method | Description |
|---|---|---|
/start/v4/ |
POST | Start a synchronous inference job |
/start/v4/async |
POST | Start an asynchronous inference job |
/check/v4/ |
POST | Check the status of an async job |
/cancel/v4/ |
POST | Cancel a running job |
Request and Response Format
Request body:
{
"apiKey": "your-api-key",
"modelKey": "your-model-key",
"modelInputs": {
"prompt": "your prompt here",
"parameter1": "value1",
"parameter2": "value2"
},
"startOnly": false
}
Response body (success):
{
"id": "call-abc123",
"message": "success",
"created": 1707235200,
"apiVersion": "v4",
"modelOutputs": [
{
"image_base64": "iVBORw0KGgo...",
"seed": 42
}
]
}
Pricing Comparison
| Platform | Pricing Model | Cold Start | GPU Options |
|---|---|---|---|
| Nano Banana | Per-second billing | Optimized | A100, A10G, T4 |
| Replicate | Per-second billing | Variable | A100, A40, T4 |
| fal.ai | Per-second billing | Fast | A100, H100 |
| Hypereal AI | Per-generation | No cold start | Managed |
| RunPod | Per-second billing | Variable | Wide selection |
Error Handling Best Practices
Always implement proper error handling and retries:
import banana_dev as banana
import time
def run_with_retry(api_key, model_key, payload, max_retries=3):
for attempt in range(max_retries):
try:
result = banana.run(api_key, model_key, payload)
if result.get("message") == "success":
return result["modelOutputs"]
if "cold start" in str(result.get("message", "")).lower():
print(f"Cold start detected, retrying ({attempt + 1}/{max_retries})...")
time.sleep(5)
continue
raise Exception(f"API error: {result.get('message')}")
except Exception as e:
if attempt == max_retries - 1:
raise
print(f"Attempt {attempt + 1} failed: {e}. Retrying...")
time.sleep(2 ** attempt)
raise Exception("Max retries exceeded")
# Usage
result = run_with_retry(
api_key="your-api-key",
model_key="your-model-key",
payload={"prompt": "a beautiful landscape painting"}
)
Webhook Support
For production applications, use webhooks instead of polling:
payload = {
"prompt": "a cinematic shot of a mountain range",
"num_inference_steps": 50,
}
result = banana.start(
api_key,
model_key,
payload,
webhook={
"url": "https://your-app.com/api/banana-webhook",
"headers": {"Authorization": "Bearer your-secret"}
}
)
Your webhook endpoint receives the result when the job completes:
# Flask webhook handler
from flask import Flask, request
app = Flask(__name__)
@app.route("/api/banana-webhook", methods=["POST"])
def banana_webhook():
data = request.json
call_id = data["id"]
outputs = data["modelOutputs"]
# Process the results
print(f"Job {call_id} completed with outputs: {outputs}")
return {"status": "ok"}, 200
Troubleshooting
Long cold start times: Pre-warm your model by sending a dummy request on a schedule. Nano Banana's optimized tier reduces cold starts, but they can still occur after periods of inactivity.
Timeout errors: Increase your client-side timeout and use async mode for inference jobs that take more than 30 seconds.
Rate limiting: Banana applies rate limits based on your plan. If you hit limits, implement exponential backoff or upgrade your plan.
Model deployment failures:
Check your Dockerfile for missing dependencies. Use banana logs to view server-side error messages.
Conclusion
Nano Banana provides a solid platform for serverless GPU inference with a simple API and pay-per-second pricing. Whether you are running pre-built image generation models or deploying custom ML models, the integration is straightforward with both Python and JavaScript.
If you want a simpler approach to AI media generation without managing model deployments, Hypereal AI offers a unified API that handles image generation, video creation, lip sync, and more -- all with per-generation pricing and zero cold starts. It is ideal for developers who want to focus on building their product rather than managing infrastructure.
Related Articles
Start Building Today
Get 35 free credits on signup. No credit card required. Generate your first image in under 5 minutes.
