GLM 5.1 API

Long-context Zhipu AI reasoning model with 202K context, 128K output, tool calling, structured output, and cache support.

Z.aiText Generation202K contextReleased Apr 7, 2026ChinaProprietary EndpointNew

About GLM 5.1

Long-context Zhipu AI reasoning model with 202K context, 128K output, tool calling, structured output, and cache support.

Notes: - Served by Alibaba Cloud Model Studio in China deployment mode - Context window: 202K tokens - Maximum output: 128K tokens - Supports function calling, structured output, and context cache - Structured output should run with enable_thinking=false - Does not support web search, batches, prefix continuation, or fine-tuning

Also known as Z.ai GLM 5.1, GLM-5.1, glm-5-1

reasoningfunction callingstructured outputcache

GLM 5.1 specs

Model ID
glm-5-1
Provider
Z.ai
Category
Text Generation
Released
Apr 7, 2026
Context window
202K tokens
Input
Text
Output
Text
Region
China
Endpoints
POST /v1/chat/completions
POST /v1/responses
POST /v1/messages

GLM 5.1 API pricingSave up to 41%

Live pay-as-you-go rates from the EmpirioLabs catalog. You are billed only for what you use, with no monthly minimum.

Type
Spec
Rate
Input
per 1M prompt tokens
$1.40<=32K $0.825$1.4032K-200K $1.10
Output
per 1M generated tokens
$4.40<=32K $3.301$4.4032K-200K $3.851
Implicit cache read
per 1M cached input tokens
$0.26<=32K $0.165$0.2632K-200K $0.22
Web Search (Linkup)
per call when invoked
$0.013
Compare on the full pricing page

How to call the GLM 5.1 API

GLM 5.1 serves the OpenAI-compatible Chat Completions API. Point any OpenAI SDK at https://api.empiriolabs.ai/v1 with your EmpirioLabs API key and use the model id glm-5-1. Get an API key from the EmpirioLabs dashboard.

cURL
curl https://api.empiriolabs.ai/v1/chat/completions \
  -H "Authorization: Bearer $EMPIRIOLABS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-5-1",
    "messages": [
      {"role": "user", "content": "Write a haiku about the ocean."}
    ]
  }'
Python (OpenAI SDK)
from openai import OpenAI

client = OpenAI(
    base_url="https://api.empiriolabs.ai/v1",
    api_key="YOUR_EMPIRIOLABS_API_KEY",
)

response = client.chat.completions.create(
    model="glm-5-1",
    messages=[{"role": "user", "content": "Write a haiku about the ocean."}],
)
print(response.choices[0].message.content)
Full GLM 5.1 API reference

GLM 5.1 API parameters

Request parameters supported by the GLM 5.1 API on EmpirioLabs. Defaults apply when a field is omitted.

ParameterTypeDefaultRange / valuesDescription
max_tokensinteger40961 to 128000Maximum number of output tokens to generate.
temperaturenumber10 to 2Controls randomness. Lower values make responses more deterministic.
top_pnumber0.950 to 1Nucleus sampling cutoff.
top_kinteger201 to 100Limits sampling to the top K tokens.
repetition_penaltynumber10.1 to 2Penalizes repeated tokens.
reasoning_effortenummediumnone, low, medium, high, maxReasoning effort level. none disables thinking. low, medium, high, and max set bounded thinking budgets sized to the selected model. Sent as an OpenAI-style...
enable_thinkingbooleantrue-Allow the model to reason before answering. Disable this for strict structured output.
thinking_budgetinteger327681 to 38912Maximum tokens available for reasoning content when thinking is enabled.
tool_streambooleanfalse-Stream function-call arguments incrementally when streaming.
toolsarray[]-OpenAI-compatible function calling tool definitions.
tool_choiceobject--OpenAI-compatible tool choice control.
parallel_tool_callsbooleantrue-Allow multiple tool calls in a single assistant turn when supported.
response_formatobject--OpenAI-compatible JSON mode or JSON schema response format. Use non-thinking mode for strict schemas.
stoparray--Optional stop sequences.
2 more parameters in the docs

GLM 5.1 API: common questions

How much does the GLM 5.1 API cost?

On EmpirioLabs, GLM 5.1 is billed pay as you go: Input <=32K $0.825 (was $1.40); 32K-200K $1.10 (was $1.40) per 1M prompt tokens; Output <=32K $3.301 (was $4.40); 32K-200K $3.851 (was $4.40) per 1M generated tokens; Implicit cache read <=32K $0.165 (was $0.26); 32K-200K $0.22 (was $0.26) per 1M cached input tokens. The live rate card on this page always matches what the API charges.

What is the context window of GLM 5.1?

GLM 5.1 supports a 202K-token context window.

Is the GLM 5.1 API OpenAI-compatible?

Yes. GLM 5.1 serves the OpenAI-compatible Chat Completions API, so existing OpenAI SDKs work by pointing base_url at https://api.empiriolabs.ai/v1 and setting the model id to glm-5-1.

Can I try GLM 5.1 in the browser before integrating?

Yes. The EmpirioLabs playground runs GLM 5.1 in the browser with the same parameters the API exposes, so you can test prompts before writing code.

How do I get a GLM 5.1 API key?

Create an EmpirioLabs account, then generate a key under API Keys in the dashboard. Billing is pay-as-you-go credits, so you only pay for the requests you make.

Ready to use better endpoints?

Check out our pricing or reach out if you want your own model deployed on our stack.