How to Use Nano Banana via API (2026)

How to Use Nano Banana via API in 2026

Banana (now commonly referred to as Nano Banana in its latest iteration) is a serverless GPU inference platform that lets developers deploy and run machine learning models via a simple API. If you need fast, scalable inference for image generation, language models, or custom ML models without managing GPU infrastructure, Banana provides a straightforward solution.

This guide covers everything you need to know to integrate Nano Banana into your application, with working code examples in Python and JavaScript.

What Is Nano Banana?

Nano Banana is the lightweight, cost-optimized tier of the Banana inference platform. It focuses on:

Cold start optimization: Models spin up faster with pre-warmed containers
Pay-per-second billing: You only pay for actual compute time
Simple REST API: Standard HTTP requests, no SDKs required
Custom model support: Deploy any model from Hugging Face or your own checkpoints

Prerequisites

Before getting started, you need:

A Banana account at banana.dev
An API key from your Banana dashboard
A model ID (either a pre-built model or one you have deployed)
Python 3.8+ or Node.js 18+ for the code examples

Step 1: Get Your API Credentials

API Key: Your authentication token
Model Key: The unique identifier for the model you want to call

Step 2: Make Your First API Call (Python)

Install the Banana Python SDK:

pip install banana-dev

Here is a basic example calling a text-to-image model:

import banana_dev as banana

api_key = "your-api-key"
model_key = "your-model-key"

payload = {
    "prompt": "a futuristic city at sunset, cyberpunk style, 4k",
    "num_inference_steps": 30,
    "guidance_scale": 7.5,
    "width": 1024,
    "height": 1024
}

result = banana.run(api_key, model_key, payload)

print(result["modelOutputs"])

Step 3: Make Your First API Call (JavaScript)

Using the REST API directly with fetch:

const API_KEY = "your-api-key";
const MODEL_KEY = "your-model-key";

async function runInference() {
  const response = await fetch("https://api.banana.dev/start/v4/", {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      apiKey: API_KEY,
      modelKey: MODEL_KEY,
      modelInputs: {
        prompt: "a futuristic city at sunset, cyberpunk style, 4k",
        num_inference_steps: 30,
        guidance_scale: 7.5,
        width: 1024,
        height: 1024,
      },
    }),
  });

  const data = await response.json();
  console.log(data);
}

runInference();

Step 4: Handle Async Operations

For longer-running tasks, Banana uses an asynchronous pattern. You start a job and then poll for results:

import banana_dev as banana
import time

api_key = "your-api-key"
model_key = "your-model-key"

# Start the job
payload = {
    "prompt": "a detailed portrait of a cyberpunk warrior",
    "num_inference_steps": 50,
}

# Start async job
start_result = banana.start(api_key, model_key, payload)
call_id = start_result["callID"]
print(f"Job started with callID: {call_id}")

# Poll for completion
while True:
    check_result = banana.check(api_key, call_id)

    if check_result["message"] == "success":
        print("Result:", check_result["modelOutputs"])
        break
    elif check_result["message"] == "error":
        print("Error:", check_result)
        break

    print("Still processing...")
    time.sleep(2)

Step 5: Deploy a Custom Model

You can deploy your own models to Banana. Create a Dockerfile and an app.py:

# app.py - Banana model server
from potassium import Potassium, Request, Response
from transformers import pipeline

app = Potassium("my-model")

@app.init
def init():
    model = pipeline("text-generation", model="meta-llama/Llama-3-8b")
    context = {"model": model}
    return context

@app.handler()
def handler(context: dict, request: Request) -> Response:
    model = context["model"]
    prompt = request.json.get("prompt", "")
    max_tokens = request.json.get("max_tokens", 100)

    result = model(prompt, max_new_tokens=max_tokens)

    return Response(json={"output": result[0]["generated_text"]}, status=200)

if __name__ == "__main__":
    app.serve()

The Dockerfile:

FROM pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime

WORKDIR /app
COPY . .
RUN pip install potassium transformers torch

EXPOSE 8000
CMD ["python", "app.py"]

Deploy with the Banana CLI:

banana deploy

API Endpoints Reference

Endpoint	Method	Description
`/start/v4/`	POST	Start a synchronous inference job
`/start/v4/async`	POST	Start an asynchronous inference job
`/check/v4/`	POST	Check the status of an async job
`/cancel/v4/`	POST	Cancel a running job

Request and Response Format

Request body:

{
  "apiKey": "your-api-key",
  "modelKey": "your-model-key",
  "modelInputs": {
    "prompt": "your prompt here",
    "parameter1": "value1",
    "parameter2": "value2"
  },
  "startOnly": false
}

Response body (success):

{
  "id": "call-abc123",
  "message": "success",
  "created": 1707235200,
  "apiVersion": "v4",
  "modelOutputs": [
    {
      "image_base64": "iVBORw0KGgo...",
      "seed": 42
    }
  ]
}

Pricing Comparison

Platform	Pricing Model	Cold Start	GPU Options
Nano Banana	Per-second billing	Optimized	A100, A10G, T4
Replicate	Per-second billing	Variable	A100, A40, T4
fal.ai	Per-second billing	Fast	A100, H100
Hypereal AI	Per-generation	No cold start	Managed
RunPod	Per-second billing	Variable	Wide selection

Error Handling Best Practices

Always implement proper error handling and retries:

import banana_dev as banana
import time

def run_with_retry(api_key, model_key, payload, max_retries=3):
    for attempt in range(max_retries):
        try:
            result = banana.run(api_key, model_key, payload)

            if result.get("message") == "success":
                return result["modelOutputs"]

            if "cold start" in str(result.get("message", "")).lower():
                print(f"Cold start detected, retrying ({attempt + 1}/{max_retries})...")
                time.sleep(5)
                continue

            raise Exception(f"API error: {result.get('message')}")

        except Exception as e:
            if attempt == max_retries - 1:
                raise
            print(f"Attempt {attempt + 1} failed: {e}. Retrying...")
            time.sleep(2 ** attempt)

    raise Exception("Max retries exceeded")

# Usage
result = run_with_retry(
    api_key="your-api-key",
    model_key="your-model-key",
    payload={"prompt": "a beautiful landscape painting"}
)

Webhook Support

For production applications, use webhooks instead of polling:

payload = {
    "prompt": "a cinematic shot of a mountain range",
    "num_inference_steps": 50,
}

result = banana.start(
    api_key,
    model_key,
    payload,
    webhook={
        "url": "https://your-app.com/api/banana-webhook",
        "headers": {"Authorization": "Bearer your-secret"}
    }
)

Your webhook endpoint receives the result when the job completes:

# Flask webhook handler
from flask import Flask, request

app = Flask(__name__)

@app.route("/api/banana-webhook", methods=["POST"])
def banana_webhook():
    data = request.json
    call_id = data["id"]
    outputs = data["modelOutputs"]

    # Process the results
    print(f"Job {call_id} completed with outputs: {outputs}")

    return {"status": "ok"}, 200

Troubleshooting

Long cold start times: Pre-warm your model by sending a dummy request on a schedule. Nano Banana's optimized tier reduces cold starts, but they can still occur after periods of inactivity.

Timeout errors: Increase your client-side timeout and use async mode for inference jobs that take more than 30 seconds.

Rate limiting: Banana applies rate limits based on your plan. If you hit limits, implement exponential backoff or upgrade your plan.

Model deployment failures: Check your Dockerfile for missing dependencies. Use banana logs to view server-side error messages.

Conclusion

Nano Banana provides a solid platform for serverless GPU inference with a simple API and pay-per-second pricing. Whether you are running pre-built image generation models or deploying custom ML models, the integration is straightforward with both Python and JavaScript.

If you want a simpler approach to AI media generation without managing model deployments, Hypereal AI offers a unified API that handles image generation, video creation, lip sync, and more -- all with per-generation pricing and zero cold starts. It is ideal for developers who want to focus on building their product rather than managing infrastructure.

How to Use Nano Banana via API in 2026

This guide covers everything you need to know to integrate Nano Banana into your application, with working code examples in Python and JavaScript.

What Is Nano Banana?

Nano Banana is the lightweight, cost-optimized tier of the Banana inference platform. It focuses on:

Cold start optimization: Models spin up faster with pre-warmed containers
Pay-per-second billing: You only pay for actual compute time
Simple REST API: Standard HTTP requests, no SDKs required
Custom model support: Deploy any model from Hugging Face or your own checkpoints

Prerequisites

Before getting started, you need:

A Banana account at banana.dev
An API key from your Banana dashboard
A model ID (either a pre-built model or one you have deployed)
Python 3.8+ or Node.js 18+ for the code examples

Step 1: Get Your API Credentials

API Key: Your authentication token
Model Key: The unique identifier for the model you want to call

Step 2: Make Your First API Call (Python)

Install the Banana Python SDK:

pip install banana-dev

Here is a basic example calling a text-to-image model:

import banana_dev as banana

api_key = "your-api-key"
model_key = "your-model-key"

payload = {
    "prompt": "a futuristic city at sunset, cyberpunk style, 4k",
    "num_inference_steps": 30,
    "guidance_scale": 7.5,
    "width": 1024,
    "height": 1024
}

result = banana.run(api_key, model_key, payload)

print(result["modelOutputs"])

Step 3: Make Your First API Call (JavaScript)

Using the REST API directly with fetch:

const API_KEY = "your-api-key";
const MODEL_KEY = "your-model-key";

async function runInference() {
  const response = await fetch("https://api.banana.dev/start/v4/", {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      apiKey: API_KEY,
      modelKey: MODEL_KEY,
      modelInputs: {
        prompt: "a futuristic city at sunset, cyberpunk style, 4k",
        num_inference_steps: 30,
        guidance_scale: 7.5,
        width: 1024,
        height: 1024,
      },
    }),
  });

  const data = await response.json();
  console.log(data);
}

runInference();

Step 4: Handle Async Operations

For longer-running tasks, Banana uses an asynchronous pattern. You start a job and then poll for results:

import banana_dev as banana
import time

api_key = "your-api-key"
model_key = "your-model-key"

# Start the job
payload = {
    "prompt": "a detailed portrait of a cyberpunk warrior",
    "num_inference_steps": 50,
}

# Start async job
start_result = banana.start(api_key, model_key, payload)
call_id = start_result["callID"]
print(f"Job started with callID: {call_id}")

# Poll for completion
while True:
    check_result = banana.check(api_key, call_id)

    if check_result["message"] == "success":
        print("Result:", check_result["modelOutputs"])
        break
    elif check_result["message"] == "error":
        print("Error:", check_result)
        break

    print("Still processing...")
    time.sleep(2)

Step 5: Deploy a Custom Model

You can deploy your own models to Banana. Create a Dockerfile and an app.py:

# app.py - Banana model server
from potassium import Potassium, Request, Response
from transformers import pipeline

app = Potassium("my-model")

@app.init
def init():
    model = pipeline("text-generation", model="meta-llama/Llama-3-8b")
    context = {"model": model}
    return context

@app.handler()
def handler(context: dict, request: Request) -> Response:
    model = context["model"]
    prompt = request.json.get("prompt", "")
    max_tokens = request.json.get("max_tokens", 100)

    result = model(prompt, max_new_tokens=max_tokens)

    return Response(json={"output": result[0]["generated_text"]}, status=200)

if __name__ == "__main__":
    app.serve()

The Dockerfile:

FROM pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime

WORKDIR /app
COPY . .
RUN pip install potassium transformers torch

EXPOSE 8000
CMD ["python", "app.py"]

Deploy with the Banana CLI:

banana deploy

API Endpoints Reference

Endpoint	Method	Description
`/start/v4/`	POST	Start a synchronous inference job
`/start/v4/async`	POST	Start an asynchronous inference job
`/check/v4/`	POST	Check the status of an async job
`/cancel/v4/`	POST	Cancel a running job

Request and Response Format

Request body:

{
  "apiKey": "your-api-key",
  "modelKey": "your-model-key",
  "modelInputs": {
    "prompt": "your prompt here",
    "parameter1": "value1",
    "parameter2": "value2"
  },
  "startOnly": false
}

Response body (success):

{
  "id": "call-abc123",
  "message": "success",
  "created": 1707235200,
  "apiVersion": "v4",
  "modelOutputs": [
    {
      "image_base64": "iVBORw0KGgo...",
      "seed": 42
    }
  ]
}

Pricing Comparison

Platform	Pricing Model	Cold Start	GPU Options
Nano Banana	Per-second billing	Optimized	A100, A10G, T4
Replicate	Per-second billing	Variable	A100, A40, T4
fal.ai	Per-second billing	Fast	A100, H100
Hypereal AI	Per-generation	No cold start	Managed
RunPod	Per-second billing	Variable	Wide selection

Error Handling Best Practices

Always implement proper error handling and retries:

import banana_dev as banana
import time

def run_with_retry(api_key, model_key, payload, max_retries=3):
    for attempt in range(max_retries):
        try:
            result = banana.run(api_key, model_key, payload)

            if result.get("message") == "success":
                return result["modelOutputs"]

            if "cold start" in str(result.get("message", "")).lower():
                print(f"Cold start detected, retrying ({attempt + 1}/{max_retries})...")
                time.sleep(5)
                continue

            raise Exception(f"API error: {result.get('message')}")

        except Exception as e:
            if attempt == max_retries - 1:
                raise
            print(f"Attempt {attempt + 1} failed: {e}. Retrying...")
            time.sleep(2 ** attempt)

    raise Exception("Max retries exceeded")

# Usage
result = run_with_retry(
    api_key="your-api-key",
    model_key="your-model-key",
    payload={"prompt": "a beautiful landscape painting"}
)

Webhook Support

For production applications, use webhooks instead of polling:

payload = {
    "prompt": "a cinematic shot of a mountain range",
    "num_inference_steps": 50,
}

result = banana.start(
    api_key,
    model_key,
    payload,
    webhook={
        "url": "https://your-app.com/api/banana-webhook",
        "headers": {"Authorization": "Bearer your-secret"}
    }
)

Your webhook endpoint receives the result when the job completes:

# Flask webhook handler
from flask import Flask, request

app = Flask(__name__)

@app.route("/api/banana-webhook", methods=["POST"])
def banana_webhook():
    data = request.json
    call_id = data["id"]
    outputs = data["modelOutputs"]

    # Process the results
    print(f"Job {call_id} completed with outputs: {outputs}")

    return {"status": "ok"}, 200

Troubleshooting

Long cold start times: Pre-warm your model by sending a dummy request on a schedule. Nano Banana's optimized tier reduces cold starts, but they can still occur after periods of inactivity.

Timeout errors: Increase your client-side timeout and use async mode for inference jobs that take more than 30 seconds.

Rate limiting: Banana applies rate limits based on your plan. If you hit limits, implement exponential backoff or upgrade your plan.

Model deployment failures: Check your Dockerfile for missing dependencies. Use banana logs to view server-side error messages.

Start Building with Hypereal

How to Use Nano Banana via API in 2026

What Is Nano Banana?

Prerequisites

Step 1: Get Your API Credentials

Step 2: Make Your First API Call (Python)

Step 3: Make Your First API Call (JavaScript)

Step 4: Handle Async Operations

Step 5: Deploy a Custom Model

API Endpoints Reference

Request and Response Format

Pricing Comparison

Error Handling Best Practices

Webhook Support

Troubleshooting

Conclusion

Related Articles

How to Use GLM-4.6 API: Complete Developer Guide (2026)

How to Use GLM-4.7 API: Developer Guide (2026)

Google Gemini 3 API Guide: Getting Started (2026)

Start Building Today

Start Building with Hypereal

How to Use Nano Banana via API in 2026

What Is Nano Banana?

Prerequisites

Step 1: Get Your API Credentials

Step 2: Make Your First API Call (Python)

Step 3: Make Your First API Call (JavaScript)

Step 4: Handle Async Operations

Step 5: Deploy a Custom Model

API Endpoints Reference

Request and Response Format

Pricing Comparison

Error Handling Best Practices

Webhook Support

Troubleshooting

Conclusion

Related Articles

How to Use GLM-4.6 API: Complete Developer Guide (2026)

How to Use GLM-4.7 API: Developer Guide (2026)

Google Gemini 3 API Guide: Getting Started (2026)

Start Building Today