GLM 5.2 API: Pricing, Playground & Docs

Q: Which GLM 5.2 variants are available?

GLM 5.2 is available as 2 model ids: the default glm-5-2 plus glm-5-2:variant1 (Germany). Variants can differ in serving region, pricing, or supported parameters; the rate cards for each are on this page.

About GLM 5.2

Reasoning and coding model with a 1M token context, 128K output, adjustable reasoning effort, native web search, and tool calling.

Notes: - Context window: 1M tokens - Maximum output: 128K tokens - Adjustable reasoning effort from minimal to max (max recommended for complex coding) - Built-in web search adds $0.033 per request when used - Supports function calling, structured output, and streaming - Run structured output with thinking disabled

Also known as Z.ai GLM 5.2, GLM-5.2, glm-5-2

reasoningfunction callingstructured outputweb search

GLM 5.2 specs

Model ID: glm-5-2
Provider: Z.ai
Category: Text Generation
Released: Jun 16, 2026
Context window: 1M tokens
Max output: 131,072 tokens
Input: Text
Output: Text
Region: Singapore
Endpoints: POST /v1/chat/completions
POST /v1/responses
POST /v1/messages

GLM 5.2 API pricing

Live pay-as-you-go rates from the EmpirioLabs catalog. You are billed only for what you use, with no monthly minimum.

Type

Spec

Rate

Input

per 1M prompt tokens

$1.40

Output

per 1M generated tokens

$4.40

Web Search

per request

$0.033

Compare on the full pricing page

How to call the GLM 5.2 API

GLM 5.2 serves the OpenAI-compatible Chat Completions API. Point any OpenAI SDK at https://api.empiriolabs.ai/v1 with your EmpirioLabs API key and use the model id glm-5-2. Get an API key from the EmpirioLabs dashboard.

cURL

curl https://api.empiriolabs.ai/v1/chat/completions \
  -H "Authorization: Bearer $EMPIRIOLABS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-5-2",
    "messages": [
      {"role": "user", "content": "Write a haiku about the ocean."}
    ]
  }'

Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(
    base_url="https://api.empiriolabs.ai/v1",
    api_key="YOUR_EMPIRIOLABS_API_KEY",
)

response = client.chat.completions.create(
    model="glm-5-2",
    messages=[{"role": "user", "content": "Write a haiku about the ocean."}],
)
print(response.choices[0].message.content)

Full GLM 5.2 API reference

GLM 5.2 API parameters

Request parameters supported by the GLM 5.2 API on EmpirioLabs. Defaults apply when a field is omitted.

Parameter	Type	Default	Range / values	Description
max_tokens	integer	65536	1 to 131072	Maximum number of output tokens to generate.
temperature	number	1	0 to 1	Controls randomness. Lower values make responses more deterministic.
top_p	number	0.95	0.01 to 1	Nucleus sampling cutoff.
reasoning_effort	enum	max	none, minimal, low, medium, high, xhigh, max	GLM-5.2 reasoning effort. none disables thinking; minimal through max set how hard the model reasons before answering. max is recommended for complex coding.
enable_thinking	boolean	true	-	Allow the model to reason before answering. Turn off for the lowest-latency replies or strict structured output.
do_sample	boolean	true	-	Enable sampling. Turn off for greedy deterministic output (temperature and top_p are ignored).
tool_web_search	boolean	false	-	Enable built-in web search. Adds $0.033 per request when used.
search_recency_filter	enum	noLimit	oneDay, oneWeek, oneMonth, oneYear, noLimit	Limit web search results to a recency window.
count	integer	10	1 to 50	Number of web search results to retrieve when web search is enabled.
search_domain_filter	string	-	-	Restrict web search to a specific domain.
search_prompt	string	-	-	Optional prompt used to summarize retrieved web search results.
search_result	boolean	true	-	Return web search result metadata in the response when web search is enabled.
tool_stream	boolean	false	-	Stream function-call arguments incrementally when streaming.
tools	array	[]	-	OpenAI-compatible function calling tool definitions.

3 more parameters in the docs

GLM 5.2 variants

Variants are alternate versions of GLM 5.2 with their own model id. Depending on the variant, they can differ in serving region, pricing, or supported parameters; everything else works the same way.

GLM 5.2 :variant1Germany1M contextSave up to 21%

Call it with the model id glm-5-2:variant1.

Type

Spec

Rate

Input

per 1M prompt tokens

$1.40$1.10

Output

per 1M generated tokens

$4.40$3.851

Implicit cache read

per 1M cached input tokens

$0.275

Try :variant1 in the playground

GLM 5.2 API: common questions

How much does the GLM 5.2 API cost?

On EmpirioLabs, GLM 5.2 is billed pay as you go: Input $1.40 per 1M prompt tokens; Output $4.40 per 1M generated tokens; Web Search $0.033 per request. The live rate card on this page always matches what the API charges.

What is the context window of GLM 5.2?

GLM 5.2 supports a 1M-token context window with up to 131,072 output tokens per response.

Is the GLM 5.2 API OpenAI-compatible?

Yes. GLM 5.2 serves the OpenAI-compatible Chat Completions API, so existing OpenAI SDKs work by pointing base_url at https://api.empiriolabs.ai/v1 and setting the model id to glm-5-2.

Which GLM 5.2 variants are available?

GLM 5.2 is available as 2 model ids: the default glm-5-2 plus glm-5-2:variant1 (Germany). Variants can differ in serving region, pricing, or supported parameters; the rate cards for each are on this page.

Can I try GLM 5.2 in the browser before integrating?

Yes. The EmpirioLabs playground runs GLM 5.2 in the browser with the same parameters the API exposes, so you can test prompts before writing code.

How do I get a GLM 5.2 API key?

Create an EmpirioLabs account, then generate a key under API Keys in the dashboard. Billing is pay-as-you-go credits, so you only pay for the requests you make.

GLM 5.2 API

About GLM 5.2

GLM 5.2 specs

GLM 5.2 API pricing

How to call the GLM 5.2 API

GLM 5.2 API parameters

GLM 5.2 variants

GLM 5.2 :variant1Germany1M contextSave up to 21%

GLM 5.2 API: common questions

How much does the GLM 5.2 API cost?

What is the context window of GLM 5.2?

Is the GLM 5.2 API OpenAI-compatible?

Which GLM 5.2 variants are available?

Can I try GLM 5.2 in the browser before integrating?

How do I get a GLM 5.2 API key?

More Text Generation model APIs

Kimi K2.7 Code

Fugu Ultra

Qwen3.7 Plus

Kimi K2.7 Code Highspeed

MiniMax M3

Qwen3.7 Max

Ready to use better endpoints?