Fugu Ultra is Sakana AI's multi-agent conductor, now live on EmpirioLabs. Instead of answering from a single model, it coordinates a pool of expert models on every request and composes their work into one answer, an approach Sakana calls a conductor. It is built for hard, high-stakes problems: complex reasoning, code generation and review, research, and long multi-step tasks.
Fugu Ultra is available today through an OpenAI-compatible API, with a 1M token context window, text and image input, adjustable reasoning effort, function calling, JSON mode structured output, and built-in web search. Try it in the Playground or read the API docs.
Token usage and billing
Fugu Ultra uses more tokens than a single model, so it helps to understand this before you start. Each request coordinates several expert models internally, and all of that work is folded into the standard input and output token counts you are billed on. In practice, even a short prompt reports more input and output tokens than the text you sent, because the model's internal coordination is counted too. Billing is straightforward usage-based per-token pricing on those input and output tokens; the exact, always-current rates, including the discounted cached-input rate, are on the model page and the pricing page.
Quickstart
Point any OpenAI-compatible client at the EmpirioLabs base URL and call fugu-ultra:
curl https://api.empiriolabs.ai/v1/chat/completions \
-H "Authorization: Bearer $EMPIRIOLABS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "fugu-ultra",
"reasoning_effort": "high",
"messages": [
{"role": "user", "content": "Outline a plan to reproduce the results of a machine learning paper."}
]
}'
The Python OpenAI SDK works without changes:
from openai import OpenAI
client = OpenAI(base_url="https://api.empiriolabs.ai/v1", api_key="YOUR_EMPIRIOLABS_KEY")
resp = client.chat.completions.create(
model="fugu-ultra",
reasoning_effort="high",
messages=[{"role": "user", "content": "Explain how TLS works, then critique your explanation."}],
)
print(resp.choices[0].message.content)
Good to know
- Responses can take a while. Because it runs several models per request, expect a few seconds to a few minutes on complex prompts. The full answer is returned at once when the model finishes, not token by token. Streaming requests still work, but deliver the complete response at the end rather than streaming tokens as they generate.
- Reasoning is always on. Set
reasoning_efforttohigh,xhigh, ormax. There is no off, low, or medium; xhigh and max apply the most effort for the hardest problems. - Leave room for the answer. Use a generous
max_tokens, since very small limits can truncate or empty the answer. - Image input. Send images as standard OpenAI
image_urlcontent parts. Both uploads and URLs work. - Web search. Turn on
tool_web_searchfor built-in web search. There is no separate search fee; its cost is included in the request's token count. - Long context. The model accepts up to a 1M token context, so you can feed it large documents and codebases.
Start building with Fugu Ultra in the Playground, or see the full parameter list and examples in the documentation.



