Fugu Ultra API: Pricing, Quickstart & Limits

Jun 22, 2026

EmpirioLabs AI

Fugu Ultra is Sakana AI's multi-agent conductor, now live on EmpirioLabs. Instead of answering from a single model, it coordinates a pool of expert models on every request and composes their work into one answer, an approach Sakana calls a conductor. It is built for hard, high-stakes problems: complex reasoning, code generation and review, research, and long multi-step tasks.

Fugu Ultra is available today through an OpenAI-compatible API, with a 1M token context window, text and image input, adjustable reasoning effort, function calling, JSON mode structured output, and built-in web search. Try it in the Playground or read the API docs.

Token usage and billing

Fugu Ultra uses more tokens than a single model, so it helps to understand this before you start. Each request coordinates several expert models internally, and all of that work is folded into the standard input and output token counts you are billed on. In practice, even a short prompt reports more input and output tokens than the text you sent, because the model's internal coordination is counted too. Billing is straightforward usage-based per-token pricing on those input and output tokens; the exact, always-current rates, including the discounted cached-input rate, are on the model page and the pricing page.

Quickstart

Point any OpenAI-compatible client at the EmpirioLabs base URL and call fugu-ultra:

curl https://api.empiriolabs.ai/v1/chat/completions \
  -H "Authorization: Bearer $EMPIRIOLABS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "fugu-ultra",
    "reasoning_effort": "high",
    "messages": [
      {"role": "user", "content": "Outline a plan to reproduce the results of a machine learning paper."}
    ]
  }'

The Python OpenAI SDK works without changes:

from openai import OpenAI

client = OpenAI(base_url="https://api.empiriolabs.ai/v1", api_key="YOUR_EMPIRIOLABS_KEY")

resp = client.chat.completions.create(
    model="fugu-ultra",
    reasoning_effort="high",
    messages=[{"role": "user", "content": "Explain how TLS works, then critique your explanation."}],
)
print(resp.choices[0].message.content)

Good to know

Responses can take a while. Because it runs several models per request, expect a few seconds to a few minutes on complex prompts. The full answer is returned at once when the model finishes, not token by token. Streaming requests still work, but deliver the complete response at the end rather than streaming tokens as they generate.
Reasoning is always on. Set reasoning_effort to high, xhigh, or max. There is no off, low, or medium; xhigh and max apply the most effort for the hardest problems.
Leave room for the answer. Use a generous max_tokens, since very small limits can truncate or empty the answer.
Image input. Send images as standard OpenAI image_url content parts. Both uploads and URLs work.
Web search. Turn on tool_web_search for built-in web search. There is no separate search fee; its cost is included in the request's token count.
Long context. The model accepts up to a 1M token context, so you can feed it large documents and codebases.

Start building with Fugu Ultra in the Playground, or see the full parameter list and examples in the documentation.

How to Use the Fugu Ultra API

Token usage and billing

Quickstart

Good to know

Ready to use better endpoints?

How to Use the Fugu Ultra API

Token usage and billing

Quickstart

Good to know

Your Next Articles

How to Call an AI Video Composer API

Seedance 2.5: What to Know Before the Release

Seedance 2.0 Mini: The Fast, Low-Cost Video API

Ready to use better endpoints?