
Reasoning and coding model with a 1M token context, 128K output, adjustable reasoning effort, native web search, and tool calling.
Reasoning and coding model with a 1M token context, 128K output, adjustable reasoning effort, native web search, and tool calling.
Notes: - Context window: 1M tokens - Maximum output: 128K tokens - Adjustable reasoning effort from minimal to max (max recommended for complex coding) - Built-in web search adds $0.033 per request when used - Supports function calling, structured output, and streaming - Run structured output with thinking disabled
Also known as Z.ai GLM 5.2, GLM-5.2, glm-5-2
glm-5-2POST /v1/chat/completionsPOST /v1/responsesPOST /v1/messagesLive pay-as-you-go rates from the EmpirioLabs catalog. You are billed only for what you use, with no monthly minimum.
GLM 5.2 serves the OpenAI-compatible Chat Completions API. Point any OpenAI SDK at https://api.empiriolabs.ai/v1 with your EmpirioLabs API key and use the model id glm-5-2. Get an API key from the EmpirioLabs dashboard.
curl https://api.empiriolabs.ai/v1/chat/completions \
-H "Authorization: Bearer $EMPIRIOLABS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "glm-5-2",
"messages": [
{"role": "user", "content": "Write a haiku about the ocean."}
]
}'from openai import OpenAI
client = OpenAI(
base_url="https://api.empiriolabs.ai/v1",
api_key="YOUR_EMPIRIOLABS_API_KEY",
)
response = client.chat.completions.create(
model="glm-5-2",
messages=[{"role": "user", "content": "Write a haiku about the ocean."}],
)
print(response.choices[0].message.content)Request parameters supported by the GLM 5.2 API on EmpirioLabs. Defaults apply when a field is omitted.
| Parameter | Type | Default | Range / values | Description |
|---|---|---|---|---|
| max_tokens | integer | 65536 | 1 to 131072 | Maximum number of output tokens to generate. |
| temperature | number | 1 | 0 to 1 | Controls randomness. Lower values make responses more deterministic. |
| top_p | number | 0.95 | 0.01 to 1 | Nucleus sampling cutoff. |
| reasoning_effort | enum | max | none, minimal, low, medium, high, xhigh, max | GLM-5.2 reasoning effort. none disables thinking; minimal through max set how hard the model reasons before answering. max is recommended for complex coding. |
| enable_thinking | boolean | true | - | Allow the model to reason before answering. Turn off for the lowest-latency replies or strict structured output. |
| do_sample | boolean | true | - | Enable sampling. Turn off for greedy deterministic output (temperature and top_p are ignored). |
| tool_web_search | boolean | false | - | Enable built-in web search. Adds $0.033 per request when used. |
| search_recency_filter | enum | noLimit | oneDay, oneWeek, oneMonth, oneYear, noLimit | Limit web search results to a recency window. |
| count | integer | 10 | 1 to 50 | Number of web search results to retrieve when web search is enabled. |
| search_domain_filter | string | - | - | Restrict web search to a specific domain. |
| search_prompt | string | - | - | Optional prompt used to summarize retrieved web search results. |
| search_result | boolean | true | - | Return web search result metadata in the response when web search is enabled. |
| tool_stream | boolean | false | - | Stream function-call arguments incrementally when streaming. |
| tools | array | [] | - | OpenAI-compatible function calling tool definitions. |
Variants are alternate versions of GLM 5.2 with their own model id. Depending on the variant, they can differ in serving region, pricing, or supported parameters; everything else works the same way.
Call it with the model id glm-5-2:variant1.
On EmpirioLabs, GLM 5.2 is billed pay as you go: Input $1.40 per 1M prompt tokens; Output $4.40 per 1M generated tokens; Web Search $0.033 per request. The live rate card on this page always matches what the API charges.
GLM 5.2 supports a 1M-token context window with up to 131,072 output tokens per response.
Yes. GLM 5.2 serves the OpenAI-compatible Chat Completions API, so existing OpenAI SDKs work by pointing base_url at https://api.empiriolabs.ai/v1 and setting the model id to glm-5-2.
GLM 5.2 is available as 2 model ids: the default glm-5-2 plus glm-5-2:variant1 (Germany). Variants can differ in serving region, pricing, or supported parameters; the rate cards for each are on this page.
Yes. The EmpirioLabs playground runs GLM 5.2 in the browser with the same parameters the API exposes, so you can test prompts before writing code.
Create an EmpirioLabs account, then generate a key under API Keys in the dashboard. Billing is pay-as-you-go credits, so you only pay for the requests you make.
Check out our pricing or reach out if you want your own model deployed on our stack.