Sakana Fugu Deep Dive — The June 22, 2026 'LLM Trained to Call Other LLMs' from Sakana AI: Dynamic Orchestration Across GPT-5.5 / Claude Opus 4.8 / Gemini 3.1 Pro, Powered by the ICLR 2026 TRINITY / Conductor Papers, Claiming 73.7 on SWE-Bench Pro (Beating Opus 4.8), Shipping as Fugu / Fugu Ultra with $20 / $100 / $200 Subscription Tiers — EU/EEA Excluded Pending GDPR Compliance
Sakana AI officially launched Sakana Fugu on June 22, 2026 (fugu-release / product page / gihyo.jp / GIGAZINE). Critically, this is not a next-generation Japanese LLM — it is an LLM trained to call other LLMs, a 'conductor' model that dynamically orchestrates frontier models inside the loop. When you send a query, Fugu itself either (1) answers directly when it can, or (2) for complex multi-step tasks selects, dispatches, verifies, and integrates from an agent pool that includes GPT-5.5, Claude Opus 4.8, Gemini 3.1 Pro and others. Academic basis: two ICLR 2026 papers — TRINITY (an evolutionarily optimized LLM coordinator that dynamically assigns Thinker / Worker / Verifier roles) and Conductor (RL-discovered coordination strategies expressed in natural language). Two variants: Fugu (everyday tasks, low latency) and Fugu Ultra (hardest problems, deep coordination — pool composition is fixed and cannot be excluded). Benchmarks: SWE-Bench Pro 73.7 (reported to beat Claude Opus 4.8, per XenoSpectrum), Terminal-Bench 2.1 above Anthropic's latest, Charxiv Reasoning above Claude Mythos Preview — but lags on Humanity's Last Exam (HLE). Sakana's own framing is conservative: "shoulder-to-shoulder with Fable 5 and Mythos Preview," not blanket dominance. Pricing: Fugu Ultra at $5/M input ($10/M >272K) and $30/M output ($45/M >272K), plus subscriptions at Standard $20 / Pro $100 / Max $200 per month (both Fugu and Fugu Ultra). Enterprise is usage-based. OpenAI-compatible API at console.sakana.ai. Not available in the EU/EEA pending GDPR compliance; Japan-region usage works. The strategic point is structural resilience, not raw performance — escape from single-vendor dependence and diversification against export-control risk (directly continuing our Sakana Marlin column's Fable 5 export-restriction thread). BuildFastWithAI calls it 'the orchestration model that routes around export controls,' and Clanker Cloud frames it as 'Model Orchestration Is Becoming the Product.' Fugu's own parameter count, Japanese-specific benchmark scores (ELYZA / JMMLU / JMT-Bench), and individual statements from David Ha / Llion Jones are not yet confirmed, leaving 'thin wrapper over external APIs' criticism and independent verification as open questions.
TL;DR — Sakana Fugu in One Sentence
Sakana AI officially launched Sakana Fugu on June 22, 2026 (fugu-release / product page).
The single most important point: Fugu is not a next-generation Japanese LLM — it is an LLM trained to call other LLMs. A conductor-style model that dynamically orchestrates frontier models inside its loop.
Four points:
1. A new category: the orchestration model — Fugu picks, dispatches, verifies, and integrates from an agent pool of GPT-5.5 / Claude Opus 4.8 / Gemini 3.1 Pro and others 2. Academic basis: two ICLR 2026 papers — TRINITY (evolutionarily optimized LLM coordinator) + Conductor (RL-discovered coordination strategies) 3. Two variants — everyday-use Fugu and hardest-problem Fugu Ultra 4. The value prop is structural resilience, not raw performance — escape from single-vendor dependence and export-control risk
This column sits next to our Sakana Marlin coverage, Claude Fable 5 export-control suspension column, and same-day PLaMo 3.0 Prime piece as the June 22, 2026 "Sakana AI × Japanese AI front" cluster.
Release Overview — Fugu as Conductor
| Item | Value |
|---|---|
| GA date | June 22, 2026 |
| Distribution | OpenAI-compatible API + subscriptions (console.sakana.ai) |
| Variants | Fugu (everyday, low-latency) / Fugu Ultra (hardest problems, deep coordination) |
| Academic basis | ICLR 2026 papers TRINITY and Conductor |
| Relationship to Sakana Marlin | Separate line. Marlin (released early June 2026) is an autonomous research agent; Fugu is the conductor. Future plans hint at calling Marlin-style agents from inside Fugu's pool. |
Naming: Neither Sakana's official posts nor gihyo / GIGAZINE explicitly explain why "Fugu" (pufferfish). What's verifiable is that it fits the company's ongoing "Japanese fish names" theme (Sakana → Marlin → Fugu).
Architecture — An LLM Trained to Call LLMs
Fugu is "an LLM trained to call other LLMs". On a query, Fugu either:
- Answers directly when it can (recursive self-call permitted) - For complex multi-step work, selects / dispatches / verifies / integrates results from external LLMs
The agent pool includes (per gihyo): GPT-5.5 (OpenAI), Claude Opus 4.8 (Anthropic), Gemini 3.1 Pro (Google DeepMind), plus other frontier closed and open models.
Users can exclude specific providers / models from the pool for compliance (e.g., exclude Chinese models, exclude models incompatible with EU data transfer). But Fugu Ultra has a fixed pool that cannot be excluded — that's the cost of maximum coordinated performance.
[Loop Engineering](../columns/loop-engineering-ai-agent-paradigm-2026-06) lens: Fugu acts as the Outer-Loop orchestrator; the Inner Loop is run by the external LLMs. Maker / Checker separation (generation and verification on different models) is built into the product — a productized implementation of the Loop Engineering Maker-Checker pattern.
Academic Basis — TRINITY and Conductor (ICLR 2026)
TRINITY (ICLR 2026): an evolutionarily optimized LLM coordinator that dynamically assigns Thinker / Worker / Verifier roles across multiple LLMs to span coding, math, reasoning, and knowledge tasks.
Conductor (ICLR 2026): uses RL to discover natural-language coordination strategies (inter-agent communication patterns and bespoke prompts). Continues Sakana's Evolutionary Model Merge / DiscoPOP line of work.
Fugu's own parameter count: undisclosed. By design — the bet is on coordination performance, not single-model size.
Context length: pricing has a distinct band above 272K tokens, indicating >272K support.
Benchmarks — Shoulder-to-Shoulder, Not Blanket Win
| Benchmark | Fugu Ultra | Comparison | Source |
|---|---|---|---|
| SWE-Bench Pro | 73.7 | Reported to beat Claude Opus 4.8 et al | XenoSpectrum |
| Terminal-Bench 2.1 | Above Anthropic's latest | Coding | SBBit |
| Charxiv Reasoning | Above Claude Mythos Preview | Complex-chart reasoning | SBBit |
| Humanity's Last Exam (HLE) | Lags | Broad academic knowledge | SBBit |
Sakana's own framing is conservative: "shoulder-to-shoulder with Fable 5 and Mythos Preview" — frontier-class parity with selective wins, not blanket dominance.
Japanese-specific benchmarks (ELYZA-tasks-100 / JMMLU / Japanese MT-Bench) are not published by official, gihyo, GIGAZINE, or SBBit. Fugu is a general-purpose coordinator, not a Japanese-language specialist. For Japanese-language strength, compare against PLaMo 3.0 Prime or Liquid AI LFM2.5-J.
Distribution and Pricing
Distribution:
- Closed weights, API only — no Hugging Face open-weights drop confirmed - OpenAI-compatible endpoint — drops into Claude Code–compatible clients, Cursor, Aider, etc. - Console: console.sakana.ai
Fugu Ultra token pricing:
| Item | Standard | >272K |
|---|---|---|
| Input | $5 / M | $10 / M |
| Output | $30 / M | $45 / M |
Subscriptions (individual):
- Standard $20 / month - Pro $100 / month - Max $200 / month
All three include both Fugu and Fugu Ultra. The cost story: for $20–$200, you get effectively bundled access to GPT-5.5 / Opus 4.8 / Gemini 3.1 Pro through the orchestrator — same tier as Claude Pro / Max but covering multiple providers.
Enterprise: usage-based (model usage + agent count).
Geographic restriction: not available in the EU / EEA while GDPR compliance is in progress. Japan-region usage is supported.
Strategic Value — Structural Resilience over Performance
The most important strategic point about Fugu is that its core value proposition is structural resilience, not raw performance.
Export controls and geopolitical risk: official messaging, XenoSpectrum, and BuildFastWithAI all emphasize escape from single-vendor dependence and export-control risk diversification. The Claude Fable 5 / Mythos 5 sudden suspension under a US government export directive in May 2026 turned single-vendor risk from theory into a documented precedent. Fugu is designed with that precedent as direct motivation.
Market positioning: stepping out of the "single strongest model" race and competing one layer up at the meta-model layer. Clanker Cloud frames it as "Model Orchestration Is Becoming the Product."
David Ha / Llion Jones quotes: signed comments exist in the official release; no extractable quoted statements were located in the gihyo / GIGAZINE / SBBit / XenoSpectrum trace. Personal X accounts (@hardmaru, @lliondj) need follow-up.
GENIAC / NEDO / METI / SoftBank funding: not mentioned in this release. No update on the $1.5B SoftBank round in the Fugu materials.
Risks and Reservations
1. "Thin wrapper over external APIs" critique room — end-state cost likely depends on GPT-5.5 / Opus 4.8 / Gemini 3.1 Pro token billing. Whether $20 / month genuinely buys Opus-4.8-class output at scale depends on rate-limit and fair-use policy that GA hasn't yet stress-tested in public. 2. Independent benchmark verification — at this stage, only ClassMethod's hands-on early review exists; broader third-party replication is pending. 3. HLE lag — broad academic knowledge tasks remain below frontier models; the strong-vs-weak split is clean. 4. Evolutionary-merge reproducibility debate — academic skepticism toward Evolutionary Model Merge carries over; the "evolved" portion of TRINITY is open to scrutiny. 5. EU / EEA unavailable — pending GDPR work. 6. No Japanese-language benchmarks — expected given the positioning, but it weakens the "domestic LLM" pitch.
Recommended Adoption Pattern (Oflight)
What we recommend in our AI consulting and software development practice — buying structural resilience, not raw quality:
Use case 1 — single gateway for multi-vendor strategy: for organizations already mixing several frontier vendors, consolidate contracts, billing, and observability under Fugu. The OpenAI-compatible API layer absorbs export-control / API-key-leak / lock-in risk.
Use case 2 — productized Loop Engineering Maker-Checker: skip building your own Maker-Checker loop. Thinker / Worker / Verifier are already in the box per the TRINITY paper.
Use case 3 — coding-heavy workloads with Fugu Ultra: SWE-Bench Pro 73.7 / Terminal-Bench 2.1 are the claimed-to-beat-Opus territory. Validate on your own project with a real PoC.
Avoid: broad academic-knowledge work (HLE lag), EU-resident orgs (no availability), and sensitive Japan-domestic workloads (cross-border data scrutiny against the agent pool needs the same diligence as for PLaMo 3.0 Prime).
FAQ
Q1. Is Fugu a Japanese-specialized LLM? A. No. General-purpose orchestrator. For Japanese strength, look at PLaMo 3.0 Prime or Liquid AI LFM2.5-J. Q2. What does "an LLM trained to call other LLMs" mean? A. Fugu is itself an LLM, but its training objective is deciding how to call external LLMs to solve a problem, not directly producing the answer. TRINITY provides the academic mechanism (dynamic Thinker / Worker / Verifier role assignment). Q3. Is $20 / month really Opus-4.8-class output? A. Structurally possible because Opus 4.8 is in the pool. In practice, rate-limit and fair-use policy will gate how often Fugu actually escalates to Opus 4.8 — that's exactly what needs measuring in the first weeks post-GA. Q4. Can specific models be excluded from the pool? A. In Fugu (regular): yes — compliance-driven exclusions are supported. In Fugu Ultra: no — fixed pool for maximum coordinated performance. Q5. EU availability? A. No — GDPR work in progress. Japan, US, and APAC are supported. Q6. Relationship to the Claude Fable 5 export suspension? A. Direct precedent and motivation. The Fable 5 / Mythos 5 suspension turned single-vendor risk into documented reality. Fugu is engineered to fall over to GPT / Gemini automatically if Claude becomes unavailable — that resilience is the pitch. Q7. Relationship to Sakana Marlin? A. Different product line. Marlin is an autonomous research agent; Fugu is the conductor. They are complementary — future plans hint at putting Marlin-style agents inside Fugu's pool. Q8. Relationship to PLaMo 3.0 Prime (same-day release)? A. Different concepts. PLaMo is a single Japanese LLM; Fugu is a multi-LLM orchestrator. Not direct competitors — PLaMo could plausibly be added to Fugu's pool. The same-day release on June 22, 2026 appears to be coincidental.
Bottom Line
Sakana Fugu is the leading example of a new AI category — the orchestration model — released June 22, 2026. Its design thesis is "an LLM trained to call other LLMs," academically grounded in the ICLR 2026 TRINITY and Conductor papers, dispatching internally across GPT-5.5 / Claude Opus 4.8 / Gemini 3.1 Pro. SWE-Bench Pro 73.7 is the headline; HLE lag is the honest counterweight; Sakana's own framing — "shoulder-to-shoulder with Fable 5 and Mythos Preview" — is the realistic register.
The single most important strategic point is that Fugu's real value is structural resilience, not raw quality. The Claude Fable 5 / Mythos 5 export-control suspension of May 2026 is the precedent driving the design — diversify away from single-vendor / single-jurisdiction risk. Clanker Cloud captures it well: "Model Orchestration Is Becoming the Product."
Practical adoption paths for Japanese enterprises: (1) a single gateway over an existing multi-vendor stack, (2) productized [Loop Engineering](../columns/loop-engineering-ai-agent-paradigm-2026-06) Maker-Checker, and (3) Fugu Ultra PoC on coding-heavy projects. Reservations: the "thin API wrapper" critique, rate-limit unknowns, HLE lag, and EU non-availability all remain — measurement on your own workload is non-negotiable.
References
Primary: - Sakana AI — fugu-release - Sakana AI — product page - console.sakana.ai - Sakana AI - Sakana AI Blog - Hugging Face SakanaAI - David Ha X / Llion Jones X Third-party: - gihyo.jp - GIGAZINE - SBBit - XenoSpectrum - Dealroom - ClassMethod DevelopersIO - StartupHub.ai - BuildFastWithAI - Clanker Cloud - talentcloud.jp Related: - PLaMo 3.0 Prime (same-day release) - Sakana Marlin - Claude Fable 5 export-control suspension - Loop Engineering - Kimi K2.7-Code - Liquid AI Japanese-specialized models Note: Fugu's own parameter count, Japanese benchmark scores (ELYZA / JMMLU / JMT-Bench), specific quoted statements from David Ha / Llion Jones / Yi Tay, GENIAC / NEDO ties, and any new funding-round details are not confirmable as of June 22, 2026. Re-check Sakana AI's official blog before any production decision.
Feel free to contact us
Contact Us