How to Use GLM-4.7 API: Developer Guide (2026)
Integrate Zhipu AI's latest model into your applications
Start Building with Hypereal
Access Kling, Flux, Sora, Veo & more through a single API. Free credits to start, scale to millions.
No credit card required • 100k+ developers • Enterprise ready
How to Use GLM-4.7 API: Developer Guide (2026)
GLM-4.7 is the latest large language model from Zhipu AI, one of China's leading AI companies. It offers competitive performance against Western models at significantly lower pricing, making it an attractive option for developers building cost-sensitive applications. This guide covers everything you need to integrate the GLM-4.7 API into your projects.
What Is GLM-4.7?
GLM-4.7 is a general-purpose large language model developed by Zhipu AI (also known as ChatGLM or BigModel). The GLM (General Language Model) family uses a unique autoregressive blank-filling architecture that distinguishes it from the GPT-style decoder-only approach.
GLM-4.7 Model Variants
| Model | Parameters | Context Window | Best For |
|---|---|---|---|
| GLM-4.7 | Undisclosed (large) | 128K tokens | Complex reasoning, long documents |
| GLM-4.7-Flash | Undisclosed (smaller) | 128K tokens | Fast responses, high throughput |
| GLM-4.7-Vision | Undisclosed | 128K tokens | Text + image understanding |
| GLM-4.7-Code | Undisclosed | 32K tokens | Code generation and analysis |
Performance Benchmarks
| Benchmark | GLM-4.7 | GPT-4o | Claude Sonnet 4 | Gemini 2.0 Flash |
|---|---|---|---|---|
| MMLU | 87.2 | 88.7 | 88.4 | 85.1 |
| HumanEval | 85.4 | 90.2 | 92.0 | 84.1 |
| MATH | 68.1 | 76.6 | 73.8 | 70.2 |
| MT-Bench | 9.1 | 9.3 | 9.2 | 8.8 |
GLM-4.7 is competitive on most benchmarks and excels particularly in Chinese language tasks, where it outperforms most Western models.
Step 1: Create a Zhipu AI Account
- Navigate to open.bigmodel.cn (Zhipu AI's developer platform)
- Click Register and create an account with your email
- Complete email verification
- Navigate to API Keys in your dashboard
- Click Create API Key and copy the key
Note: The platform interface is available in both Chinese and English. New accounts typically receive free credits for testing.
Step 2: Install the SDK
Python SDK
pip install zhipuai
Node.js / TypeScript
The GLM API follows the OpenAI-compatible format, so you can use the OpenAI SDK with a custom base URL:
npm install openai
Direct HTTP (No SDK Required)
The API uses standard REST endpoints, so you can also use curl or any HTTP client.
Step 3: Make Your First API Call
Python (Official SDK)
from zhipuai import ZhipuAI
client = ZhipuAI(api_key="your-api-key-here")
response = client.chat.completions.create(
model="glm-4.7",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain the difference between REST and GraphQL APIs."}
],
temperature=0.7,
max_tokens=1024
)
print(response.choices[0].message.content)
Python (OpenAI-Compatible)
from openai import OpenAI
client = OpenAI(
api_key="your-zhipu-api-key",
base_url="https://open.bigmodel.cn/api/paas/v4/"
)
response = client.chat.completions.create(
model="glm-4.7",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Write a Python function to merge two sorted arrays."}
],
temperature=0.7,
max_tokens=1024
)
print(response.choices[0].message.content)
Node.js / TypeScript
import OpenAI from "openai";
const client = new OpenAI({
apiKey: "your-zhipu-api-key",
baseURL: "https://open.bigmodel.cn/api/paas/v4/",
});
async function main() {
const response = await client.chat.completions.create({
model: "glm-4.7",
messages: [
{ role: "system", content: "You are a helpful coding assistant." },
{ role: "user", content: "Write a TypeScript function that debounces any function." },
],
temperature: 0.7,
max_tokens: 1024,
});
console.log(response.choices[0].message.content);
}
main();
cURL
curl https://open.bigmodel.cn/api/paas/v4/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "glm-4.7",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is quantum computing?"}
],
"temperature": 0.7,
"max_tokens": 1024
}'
Step 4: Streaming Responses
For real-time output, enable streaming:
Python
from zhipuai import ZhipuAI
client = ZhipuAI(api_key="your-api-key")
stream = client.chat.completions.create(
model="glm-4.7",
messages=[
{"role": "user", "content": "Write a detailed guide on Docker networking."}
],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
Node.js
const stream = await client.chat.completions.create({
model: "glm-4.7",
messages: [{ role: "user", content: "Explain Kubernetes pods in detail." }],
stream: true,
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content || "";
process.stdout.write(content);
}
Step 5: Vision (Image Understanding)
Use the GLM-4.7-Vision model to analyze images:
import base64
from zhipuai import ZhipuAI
client = ZhipuAI(api_key="your-api-key")
# Read and encode image
with open("diagram.png", "rb") as f:
image_base64 = base64.b64encode(f.read()).decode("utf-8")
response = client.chat.completions.create(
model="glm-4.7-vision",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this architecture diagram and identify any potential issues."
},
{
"type": "image_url",
"image_url": {
"url": f"data:image/png;base64,{image_base64}"
}
}
]
}
]
)
print(response.choices[0].message.content)
You can also pass image URLs directly:
response = client.chat.completions.create(
model="glm-4.7-vision",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What is in this image?"},
{
"type": "image_url",
"image_url": {"url": "https://example.com/photo.jpg"}
}
]
}
]
)
Step 6: Function Calling (Tool Use)
GLM-4.7 supports function calling for building agentic applications:
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a given city",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "The city name, e.g. 'Beijing'"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit"
}
},
"required": ["city"]
}
}
}
]
response = client.chat.completions.create(
model="glm-4.7",
messages=[
{"role": "user", "content": "What is the weather like in Shanghai today?"}
],
tools=tools,
tool_choice="auto"
)
# Check if the model wants to call a function
message = response.choices[0].message
if message.tool_calls:
tool_call = message.tool_calls[0]
print(f"Function: {tool_call.function.name}")
print(f"Arguments: {tool_call.function.arguments}")
API Reference
Endpoints
| Endpoint | Method | Description |
|---|---|---|
/v4/chat/completions |
POST | Chat completions (text and vision) |
/v4/embeddings |
POST | Text embeddings |
/v4/files |
POST | File upload for context |
/v4/fine-tuning/jobs |
POST | Create fine-tuning job |
/v4/images/generations |
POST | Image generation (CogView) |
Request Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
model |
string | required | Model ID (glm-4.7, glm-4.7-flash, etc.) |
messages |
array | required | Conversation messages |
temperature |
float | 0.95 | Sampling temperature (0.0 - 1.0) |
top_p |
float | 0.7 | Nucleus sampling parameter |
max_tokens |
integer | 1024 | Maximum response length |
stream |
boolean | false | Enable streaming |
tools |
array | null | Function definitions for tool use |
tool_choice |
string | "auto" | Tool selection strategy |
Pricing
GLM-4.7's pricing is one of its strongest advantages:
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| GLM-4.7 | $0.50 | $1.50 |
| GLM-4.7-Flash | $0.05 | $0.15 |
| GLM-4.7-Vision | $0.60 | $1.80 |
| GLM-4.7-Code | $0.40 | $1.20 |
Cost Comparison
| Model | Input Cost | Output Cost | vs. GLM-4.7 |
|---|---|---|---|
| GLM-4.7 | $0.50/1M | $1.50/1M | -- |
| GPT-4o | $2.50/1M | $10.00/1M | 5-7x more expensive |
| Claude Sonnet 4 | $3.00/1M | $15.00/1M | 6-10x more expensive |
| Gemini 2.0 Flash | $0.075/1M | $0.30/1M | ~50% cheaper (Flash) |
For cost-sensitive applications, GLM-4.7-Flash is particularly compelling at just $0.05 per million input tokens.
Best Practices
1. Use the Right Model for the Task
# Use GLM-4.7 for complex reasoning
complex_response = client.chat.completions.create(
model="glm-4.7",
messages=[{"role": "user", "content": "Analyze this legal document..."}]
)
# Use GLM-4.7-Flash for simple, high-volume tasks
simple_response = client.chat.completions.create(
model="glm-4.7-flash",
messages=[{"role": "user", "content": "Classify this email as spam or not spam."}]
)
2. Leverage the 128K Context Window
GLM-4.7 supports up to 128K tokens of context. Use it for:
- Analyzing long documents without chunking
- Multi-turn conversations with full history
- Processing large codebases in a single request
3. Handle Errors Gracefully
from zhipuai import ZhipuAI, APIError, RateLimitError
client = ZhipuAI(api_key="your-api-key")
try:
response = client.chat.completions.create(
model="glm-4.7",
messages=[{"role": "user", "content": "Hello"}]
)
except RateLimitError:
print("Rate limited. Wait and retry.")
except APIError as e:
print(f"API error: {e.status_code} - {e.message}")
4. Optimize for Chinese Language Tasks
GLM-4.7 is particularly strong for Chinese language processing. If your application serves Chinese-speaking users, it may outperform GPT-4o and Claude:
response = client.chat.completions.create(
model="glm-4.7",
messages=[
{"role": "system", "content": "You are a professional Chinese copywriter."},
{"role": "user", "content": "Write marketing copy for a new smartphone launch targeting young professionals in China."}
]
)
Limitations
- International latency: Servers are primarily in China, so users outside Asia may experience higher latency.
- Content filtering: Chinese AI models have strict content moderation aligned with Chinese regulations.
- Documentation: Primary documentation is in Chinese, though English documentation is available and improving.
- Ecosystem: Fewer third-party integrations compared to OpenAI or Anthropic models.
Conclusion
GLM-4.7 offers a compelling combination of strong performance and aggressive pricing that makes it worth considering for any developer building AI-powered applications, especially those serving Chinese-speaking markets or running cost-sensitive workloads. The OpenAI-compatible API format makes integration straightforward even if you are migrating from GPT-4.
For developers who need AI-powered media generation alongside their LLM workflows -- such as video creation, talking avatars, or voice synthesis -- Hypereal AI provides affordable pay-as-you-go API access to state-of-the-art generative AI models, making it easy to add visual and audio AI capabilities to any application.
Related Articles
Start Building Today
Get 35 free credits on signup. No credit card required. Generate your first image in under 5 minutes.
