Disclosure: This article was written with AI assistance and reviewed by EmpirioLabs AI.
Kimi K2.7 Code Highspeed is the faster-serving tier of Moonshot AI's Kimi K2.7 Code, now live on EmpirioLabs. It is the same trillion-parameter agentic coding model, tuned for code generation, debugging, tool use, and long multi-step engineering workflows, served on a higher-throughput, lower-latency path for teams that want answers back faster. Capabilities are identical to the standard tier: a 262,144-token context window, always-on reasoning, function calling, JSON mode structured output, and text, image, and video inputs.
If you do not need the extra speed, the standard Kimi K2.7 Code tier is the better-value option. Reach for Highspeed when latency or throughput matters more than the per-token rate. Try it in the playground, read the API docs, or see the full spec on the model page.
Pricing
Billing is strictly usage based with no subscription: input and output tokens are metered per token, and each invoked web search adds a small per-call fee that applies only when a search actually runs. Highspeed is the premium-speed tier, so its per-token rates are higher than the standard Kimi K2.7 Code tier. The exact current rates for both tiers always live on their model pages (Highspeed, standard) and on the pricing page. Reasoning is always on, and reasoning tokens are billed as output tokens, so budget your max tokens with that in mind.
Quickstart
Kimi K2.7 Code Highspeed is OpenAI-compatible, so the official SDKs work by pointing the base URL at EmpirioLabs and setting the model to kimi-k2-7-code-highspeed:
from openai import OpenAI
client = OpenAI(
api_key="YOUR_EMPIRIOLABS_API_KEY",
base_url="https://api.empiriolabs.ai/v1",
)
response = client.chat.completions.create(
model="kimi-k2-7-code-highspeed",
messages=[
{"role": "user", "content": "Write a Python function that merges overlapping intervals."}
],
)
print(response.choices[0].message.reasoning_content) # the model's reasoning
print(response.choices[0].message.content) # the final answerStreaming, function calling, JSON mode, the Anthropic-style /v1/messages endpoint, and the /v1/responses endpoint all work out of the box, exactly as they do on the standard tier.
Things to know before you build
- Same model, faster serving. Highspeed and standard Kimi K2.7 Code are the same model with the same outputs and the same 262,144-token context; Highspeed trades a higher per-token price for lower latency and higher throughput. Switch tiers by changing only the
modelfield. - Thinking is always on. Every response includes
reasoning_contentahead of the final answer, and it cannot be disabled. Reasoning counts toward output tokens and toward your max tokens limit, so leave headroom: the API accepts up to 131,072 output tokens per request. - Sampling is fixed. The model service runs pinned sampling settings, so
temperature,top_p, and penalty overrides are accepted but ignored rather than rejected. Your existing OpenAI-style code works unchanged. - Web search is built in. Set
"tool_web_search": trueon any chat request and the model runs its hosted web search tool itself: it decides when to search, reads live results, and cites sources in the answer. Each invoked search adds a small per-search fee, billed only when a search actually runs and reported inusage.tool_usage.web_search. - Tool calls carry reasoning. When you run your own function-calling loops, replay the assistant message with its
reasoning_contentfield intact; the model service requires the current turn's reasoning to stay in context during multi-step tool calling. - It is genuinely multimodal. Image and video inputs work through standard OpenAI content arrays, which makes it practical to debug from screenshots or screen recordings.
Summary
Kimi K2.7 Code Highspeed gives you the same frontier agentic coding model as Kimi K2.7 Code, served faster for latency-sensitive work. Start in the playground, read the docs, or grab an API key and point your OpenAI SDK at https://api.empiriolabs.ai/v1 with model="kimi-k2-7-code-highspeed".



