Google Gemini 3 API Guide: Getting Started (2026)

Google's Gemini 3 represents the next leap in multimodal AI. With improved reasoning, a 2 million token context window, native tool use, and significantly better code generation, Gemini 3 is positioning itself as a top choice for developers building AI-powered applications.

This guide covers everything you need to start using the Gemini 3 API: setup, authentication, code examples, pricing, and best practices.

What Is New in Gemini 3

Feature	Gemini 2.5 Pro	Gemini 3
Context window	1M tokens	2M tokens
Output tokens	65K	128K
Native tool use	Basic	Advanced (parallel + nested)
Reasoning	Strong	Frontier-level
Code generation	Strong	Best in class
Multimodal	Text, image, audio, video	Text, image, audio, video + 3D
Latency (median)	~3s	~1.5s
Pricing (input)	$1.25/M tokens	$2.00/M tokens
Pricing (output)	$5.00/M tokens	$8.00/M tokens

Gemini 3 is not just an incremental upgrade. The doubled context window, faster inference, and improved tool use make it significantly more capable for complex development tasks.

Step 1: Get Your API Key

You have two options for accessing the Gemini 3 API:

Option A: Google AI Studio (Simpler)

Best for prototyping and personal projects.

Go to aistudio.google.com/apikey
Click Create API Key
Select or create a project
Copy the key

export GOOGLE_API_KEY="your-api-key-here"

Option B: Vertex AI (Production)

Best for production applications with SLAs and enterprise features.

# Install gcloud CLI
curl https://sdk.cloud.google.com | bash

# Authenticate
gcloud auth application-default login

# Enable the API
gcloud services enable aiplatform.googleapis.com

Step 2: Install the SDK

Python

pip install google-generativeai

JavaScript / TypeScript

npm install @google/generative-ai

Go

go get github.com/google/generative-ai-go

Step 3: Make Your First API Call

Python

import google.generativeai as genai

genai.configure(api_key="YOUR_API_KEY")

model = genai.GenerativeModel("gemini-3")

response = model.generate_content("Explain how transformers work in AI")
print(response.text)

JavaScript

import { GoogleGenerativeAI } from "@google/generative-ai";

const genAI = new GoogleGenerativeAI("YOUR_API_KEY");
const model = genAI.getGenerativeModel({ model: "gemini-3" });

const result = await model.generateContent("Explain how transformers work in AI");
console.log(result.response.text());

cURL

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-3:generateContent?key=$GOOGLE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "parts": [{"text": "Explain how transformers work in AI"}]
    }]
  }'

Step 4: Multi-Turn Conversations

Gemini 3 supports stateful conversations with chat history:

model = genai.GenerativeModel("gemini-3")
chat = model.start_chat(history=[])

# First message
response = chat.send_message("What are the main design patterns in Python?")
print(response.text)

# Follow-up (model remembers context)
response = chat.send_message("Show me an example of the Factory pattern")
print(response.text)

# Another follow-up
response = chat.send_message("Now adapt it to use async/await")
print(response.text)

const chat = model.startChat({ history: [] });

const result1 = await chat.sendMessage("What are the main design patterns in Python?");
console.log(result1.response.text());

const result2 = await chat.sendMessage("Show me an example of the Factory pattern");
console.log(result2.response.text());

Step 5: Multimodal Input

Gemini 3 can process text, images, audio, and video in a single request.

Image Analysis

import PIL.Image

model = genai.GenerativeModel("gemini-3")

image = PIL.Image.open("screenshot.png")

response = model.generate_content([
    "Analyze this UI screenshot. Identify usability issues and suggest improvements.",
    image
])
print(response.text)

File Upload (for large files)

# Upload a video file
video_file = genai.upload_file("demo.mp4", mime_type="video/mp4")

# Wait for processing
import time
while video_file.state.name == "PROCESSING":
    time.sleep(5)
    video_file = genai.get_file(video_file.name)

# Analyze the video
response = model.generate_content([
    "Summarize what happens in this video and provide timestamps for key events.",
    video_file
])
print(response.text)

Step 6: Function Calling (Tool Use)

Gemini 3's improved tool use lets you define functions that the model can call to get real-time data or perform actions.

import google.generativeai as genai

# Define tools
def get_weather(city: str, unit: str = "celsius") -> dict:
    """Get the current weather for a given city."""
    # Your actual API call here
    return {"city": city, "temperature": 22, "unit": unit, "condition": "sunny"}

def search_products(query: str, max_results: int = 5) -> list:
    """Search for products in the catalog."""
    # Your database query here
    return [{"name": "Widget Pro", "price": 29.99}]

model = genai.GenerativeModel(
    "gemini-3",
    tools=[get_weather, search_products]
)

chat = model.start_chat()
response = chat.send_message(
    "What's the weather in Tokyo, and find me some umbrella products if it's raining"
)
print(response.text)

Gemini 3 can call multiple functions in parallel and chain results, making it ideal for building AI agents.

Step 7: Structured Output (JSON Mode)

Force Gemini 3 to return structured JSON responses:

import google.generativeai as genai
from pydantic import BaseModel

class CodeReview(BaseModel):
    file: str
    issues: list[dict]
    overall_score: int
    summary: str

model = genai.GenerativeModel(
    "gemini-3",
    generation_config=genai.GenerationConfig(
        response_mime_type="application/json",
        response_schema=CodeReview
    )
)

response = model.generate_content(
    "Review this Python code for bugs and style issues:\n\n"
    "def calc(x,y):\n  result=x/y\n  return result"
)
print(response.text)

The response will be valid JSON conforming to your schema:

{
  "file": "inline",
  "issues": [
    {"line": 2, "severity": "high", "message": "No handling for division by zero"},
    {"line": 1, "severity": "low", "message": "Missing type hints for parameters"},
    {"line": 2, "severity": "low", "message": "Missing spaces around operator per PEP 8"}
  ],
  "overall_score": 4,
  "summary": "The function has a critical division-by-zero risk and minor style issues."
}

Step 8: Streaming Responses

For better user experience, stream responses token by token:

model = genai.GenerativeModel("gemini-3")

response = model.generate_content(
    "Write a comprehensive guide to Python decorators",
    stream=True
)

for chunk in response:
    print(chunk.text, end="", flush=True)

const result = await model.generateContentStream(
  "Write a comprehensive guide to Python decorators"
);

for await (const chunk of result.stream) {
  process.stdout.write(chunk.text());
}

Pricing Comparison

Model	Input (per 1M tokens)	Output (per 1M tokens)	Free Tier
Gemini 3	$2.00	$8.00	Yes (rate limited)
Gemini 2.5 Pro	$1.25	$5.00	Yes (rate limited)
Gemini 2.5 Flash	$0.15	$0.60	Yes (generous)
GPT-5	$3.00	$15.00	No
Claude Opus 4	$15.00	$75.00	No
Claude Sonnet 4	$3.00	$15.00	No

Gemini 3 offers strong value at its price point, especially considering the 2M token context window and free tier availability.

Best Practices

Context Window Management

With 2M tokens available, you can fit entire codebases in context. But more is not always better:

# Good: Provide relevant context
response = model.generate_content([
    "Here is the relevant source file:\n\n" + source_code,
    "Here are the failing test cases:\n\n" + test_output,
    "Fix the code to pass all tests."
])

# Avoid: Dumping everything
# Don't send 500 files when the bug is in one function

Error Handling

import google.generativeai as genai
from google.api_core import exceptions

try:
    response = model.generate_content("Your prompt here")
    print(response.text)
except exceptions.ResourceExhausted:
    print("Rate limit hit. Wait and retry.")
except exceptions.InvalidArgument as e:
    print(f"Invalid request: {e}")
except exceptions.PermissionDenied:
    print("Check your API key and permissions.")

Rate Limit Handling with Exponential Backoff

import time
import random

def call_with_backoff(prompt, max_retries=5):
    for attempt in range(max_retries):
        try:
            return model.generate_content(prompt)
        except exceptions.ResourceExhausted:
            wait = (2 ** attempt) + random.uniform(0, 1)
            print(f"Rate limited. Retrying in {wait:.1f}s...")
            time.sleep(wait)
    raise Exception("Max retries exceeded")

Frequently Asked Questions

Is there a free tier for Gemini 3? Yes. Google AI Studio provides free rate-limited access to Gemini 3, similar to the Gemini 2.5 Pro free tier. Expect approximately 2-5 RPM and 50-100 RPD for the free tier.

Can I use Gemini 3 in Cursor or VS Code? Yes. Use the OpenAI-compatible endpoint at https://generativelanguage.googleapis.com/v1beta/openai with your API key. Configure it as a custom model in Cursor or use it with Continue.dev in VS Code.

What is the difference between Google AI Studio and Vertex AI? Google AI Studio is simpler and better for prototyping. Vertex AI is the production platform with SLAs, VPC integration, data residency controls, and enterprise support.

Does Gemini 3 support fine-tuning? Google offers fine-tuning for Gemini models through Vertex AI. Check the latest documentation for Gemini 3 fine-tuning availability and supported methods.

Wrapping Up

The Gemini 3 API combines a massive context window, strong multimodal capabilities, and competitive pricing to make it one of the best options for developers building AI features in 2026. The free tier and straightforward SDK make it easy to get started, while Vertex AI provides a clear path to production.

If your application needs AI-generated media like images, videos, or talking avatars alongside Gemini's language capabilities, Hypereal AI provides a unified media generation API with competitive pricing. Sign up free to explore the platform.