如何使用 GLM-4.7 API：开发者指南 (2026)

GLM-4.7 是来自中国领先 AI 公司之一——智谱 AI (Zhipu AI) 的最新大语言模型。它在性能上足以媲美西方主流模型，且价格显著更低，对于构建成本敏感型应用的开发者来说，这是一个极具吸引力的选择。本指南涵盖了将 GLM-4.7 API 集成到项目中所涉及的全部内容。

什么是 GLM-4.7？

GLM-4.7 是由智谱 AI（也称为 ChatGLM 或 BigModel）开发的通用大语言模型。GLM (General Language Model) 系列采用了独特的自回归空白填充（blank-filling）架构，这使其区别于 GPT 风格的仅解码器（decoder-only）方法。

GLM-4.7 模型变体

模型	参数量	上下文窗口	适用场景
GLM-4.7	未公开 (Large)	128K tokens	复杂推理、长文本处理
GLM-4.7-Flash	未公开 (Smaller)	128K tokens	快速响应、高吞吐量
GLM-4.7-Vision	未公开	128K tokens	文本 + 图像理解
GLM-4.7-Code	未公开	32K tokens	代码生成与分析

性能基准测试

基准测试	GLM-4.7	GPT-4o	Claude Sonnet 4	Gemini 2.0 Flash
MMLU	87.2	88.7	88.4	85.1
HumanEval	85.4	90.2	92.0	84.1
MATH	68.1	76.6	73.8	70.2
MT-Bench	9.1	9.3	9.2	8.8

GLM-4.7 在大多数基准测试中都极具竞争力，尤其在中文任务中表现卓越，超越了大多数西方模型。

第 1 步：创建 Zhipu AI 账号

访问 open.bigmodel.cn（智谱 AI 开放平台）
点击注册并使用电子邮箱创建账号
完成邮箱验证
跳转到控制面板中的 API Keys
点击创建 API Key 并复制该密钥

注意： 平台界面提供中英双语。新账号通常会获得用于测试的免费额度。

第 2 步：安装 SDK

Python SDK

pip install zhipuai

Node.js / TypeScript

GLM API 遵循 OpenAI 兼容格式，因此你可以使用 OpenAI SDK 并配置自定义基础 URL：

npm install openai

直接使用 HTTP（无需 SDK）

API 使用标准 REST 端点，因此你也可以使用 curl 或任何 HTTP 客户端。

第 3 步：发起首次 API 调用

Python (官方 SDK)

from zhipuai import ZhipuAI

client = ZhipuAI(api_key="your-api-key-here")

response = client.chat.completions.create(
    model="glm-4.7",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain the difference between REST and GraphQL APIs."}
    ],
    temperature=0.7,
    max_tokens=1024
)

print(response.choices[0].message.content)

Python (OpenAI 兼容方式)

from openai import OpenAI

client = OpenAI(
    api_key="your-zhipu-api-key",
    base_url="https://open.bigmodel.cn/api/paas/v4/"
)

response = client.chat.completions.create(
    model="glm-4.7",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Write a Python function to merge two sorted arrays."}
    ],
    temperature=0.7,
    max_tokens=1024
)

print(response.choices[0].message.content)

Node.js / TypeScript

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "your-zhipu-api-key",
  baseURL: "https://open.bigmodel.cn/api/paas/v4/",
});

async function main() {
  const response = await client.chat.completions.create({
    model: "glm-4.7",
    messages: [
      { role: "system", content: "You are a helpful coding assistant." },
      { role: "user", content: "Write a TypeScript function that debounces any function." },
    ],
    temperature: 0.7,
    max_tokens: 1024,
  });

  console.log(response.choices[0].message.content);
}

main();

cURL

curl https://open.bigmodel.cn/api/paas/v4/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-4.7",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is quantum computing?"}
    ],
    "temperature": 0.7,
    "max_tokens": 1024
  }'

第 4 步：流式响应 (Streaming)

如需实时输出，请启用流式传输：

Python

from zhipuai import ZhipuAI

client = ZhipuAI(api_key="your-api-key")

stream = client.chat.completions.create(
    model="glm-4.7",
    messages=[
        {"role": "user", "content": "Write a detailed guide on Docker networking."}
    ],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Node.js

const stream = await client.chat.completions.create({
  model: "glm-4.7",
  messages: [{ role: "user", content: "Explain Kubernetes pods in detail." }],
  stream: true,
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content || "";
  process.stdout.write(content);
}

第 5 步：多模态 (图像理解)

使用 GLM-4.7-Vision 模型分析图像：

import base64
from zhipuai import ZhipuAI

client = ZhipuAI(api_key="your-api-key")

# 读取并编码图像
with open("diagram.png", "rb") as f:
    image_base64 = base64.b64encode(f.read()).decode("utf-8")

response = client.chat.completions.create(
    model="glm-4.7-vision",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Describe this architecture diagram and identify any potential issues."
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/png;base64,{image_base64}"
                    }
                }
            ]
        }
    ]
)

print(response.choices[0].message.content)

你也可以直接传递图像 URL：

response = client.chat.completions.create(
    model="glm-4.7-vision",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What is in this image?"},
                {
                    "type": "image_url",
                    "image_url": {"url": "https://example.com/photo.jpg"}
                }
            ]
        }
    ]
)

第 6 步：函数调用 (工具使用)

GLM-4.7 支持函数调用（Function Calling），用于构建智能体（Agent）应用：

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a given city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "The city name, e.g. 'Beijing'"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature unit"
                    }
                },
                "required": ["city"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="glm-4.7",
    messages=[
        {"role": "user", "content": "What is the weather like in Shanghai today?"}
    ],
    tools=tools,
    tool_choice="auto"
)

# 检查模型是否决定调用函数
message = response.choices[0].message
if message.tool_calls:
    tool_call = message.tool_calls[0]
    print(f"Function: {tool_call.function.name}")
    print(f"Arguments: {tool_call.function.arguments}")

API 参考

接口端点 (Endpoints)

端点	方法	描述
`/v4/chat/completions`	POST	聊天补全（文本与视觉）
`/v4/embeddings`	POST	文本向量化
`/v4/files`	POST	上传用于上下文的文件
`/v4/fine-tuning/jobs`	POST	创建微调任务
`/v4/images/generations`	POST	图像生成 (CogView)

请求参数

参数	类型	默认值	描述
`model`	string	必填	模型 ID (glm-4.7, glm-4.7-flash 等)
`messages`	array	必填	对话消息数组
`temperature`	float	0.95	采样温度 (0.0 - 1.0)
`top_p`	float	0.7	核采样参数
`max_tokens`	integer	1024	最大响应长度
`stream`	boolean	false	是否启用流式响应
`tools`	array	null	供工具调用的函数定义
`tool_choice`	string	"auto"	工具选择策略

价格

GLM-4.7 的定价是其核心优势之一：

模型	输入 (每 1M tokens)	输出 (每 1M tokens)
GLM-4.7	$0.50	$1.50
GLM-4.7-Flash	$0.05	$0.15
GLM-4.7-Vision	$0.60	$1.80
GLM-4.7-Code	$0.40	$1.20

成本对比

模型	输入成本	输出成本	对比 GLM-4.7
GLM-4.7	$0.50/1M	$1.50/1M	--
GPT-4o	$2.50/1M	$10.00/1M	贵 5-7 倍
Claude Sonnet 4	$3.00/1M	$15.00/1M	贵 6-10 倍
Gemini 2.0 Flash	$0.075/1M	$0.30/1M	(Flash 模式) 约便宜 50%

对于成本敏感的应用，GLM-4.7-Flash 极具竞争力，每百万输入 token 仅需 0.05 美元。

最佳实践

1. 为任务选择合适的模型

# 使用 GLM-4.7 处理复杂推理
complex_response = client.chat.completions.create(
    model="glm-4.7",
    messages=[{"role": "user", "content": "Analyze this legal document..."}]
)

# 使用 GLM-4.7-Flash 处理简单、高频任务
simple_response = client.chat.completions.create(
    model="glm-4.7-flash",
    messages=[{"role": "user", "content": "Classify this email as spam or not spam."}]
)

2. 利用 128K 上下文窗口

GLM-4.7 支持高达 128K token 的上下文。可用于：

无需切片直接分析长文档
包含完整历史记录的多轮对话
在单次请求中处理大型代码库

3. 优雅地处理错误

from zhipuai import ZhipuAI, APIError, RateLimitError

client = ZhipuAI(api_key="your-api-key")

try:
    response = client.chat.completions.create(
        model="glm-4.7",
        messages=[{"role": "user", "content": "Hello"}]
    )
except RateLimitError:
    print("触发频率限制。请稍后重试。")
except APIError as e:
    print(f"API 错误: {e.status_code} - {e.message}")

4. 针对中文任务进行优化

GLM-4.7 在中文处理方面表现尤为强劲。如果你的应用服务于中文用户，它的表现可能会超过 GPT-4o 和 Claude：

response = client.chat.completions.create(
    model="glm-4.7",
    messages=[
        {"role": "system", "content": "你是一位专业的中文文案撰稿人。"},
        {"role": "user", "content": "为一款针对中国年轻职场人士的新款智能手机撰写发布会营销文案。"}
    ]
)

局限性

国际延迟： 服务器主要位于中国，因此亚洲以外的用户可能会遇到较高的延迟。
内容过滤： 中国的 AI 模型需遵循严格的内容审查规定，与当地法规保持一致。
文档说明： 主要文档为中文，虽然英文文档也在不断完善中。
生态系统： 与 OpenAI 或 Anthropic 模型相比，第三方预集成插件较少。

结论

GLM-4.7 凭借其强劲的性能和极具竞争力的价格，成为开发者的理想选择，尤其是那些服务于中文市场或运行成本敏感型工作负载的应用。其 OpenAI 兼容的 API 格式使得即使是从 GPT-4 迁移过来的开发者也能轻松上手。

对于除了大语言模型工作流之外，还需要 AI 驱动的多媒体生成能力（如视频创建、数字人对话或语音合成）的开发者，Hypereal AI 提供了高性价比、按需付费的尖端生成式 AI 模型 API 访问，让你能轻松地为任何应用添加视觉和音频 AI 功能。