株式会社オブライト
AI2026-06-22

Sakana Fugu Deep Dive — The June 22, 2026 'LLM Trained to Call Other LLMs' from Sakana AI: Dynamic Orchestration Across GPT-5.5 / Claude Opus 4.8 / Gemini 3.1 Pro, Powered by the ICLR 2026 TRINITY / Conductor Papers, Claiming 73.7 on SWE-Bench Pro (Beating Opus 4.8), Shipping as Fugu / Fugu Ultra with $20 / $100 / $200 Subscription Tiers — EU/EEA Excluded Pending GDPR Compliance

Sakana AI officially launched Sakana Fugu on June 22, 2026 (fugu-release / product page / gihyo.jp / GIGAZINE). Critically, this is not a next-generation Japanese LLM — it is an LLM trained to call other LLMs, a 'conductor' model that dynamically orchestrates frontier models inside the loop. When you send a query, Fugu itself either (1) answers directly when it can, or (2) for complex multi-step tasks selects, dispatches, verifies, and integrates from an agent pool that includes GPT-5.5, Claude Opus 4.8, Gemini 3.1 Pro and others. Academic basis: two ICLR 2026 papers — TRINITY (an evolutionarily optimized LLM coordinator that dynamically assigns Thinker / Worker / Verifier roles) and Conductor (RL-discovered coordination strategies expressed in natural language). Two variants: Fugu (everyday tasks, low latency) and Fugu Ultra (hardest problems, deep coordination — pool composition is fixed and cannot be excluded). Benchmarks: SWE-Bench Pro 73.7 (reported to beat Claude Opus 4.8, per XenoSpectrum), Terminal-Bench 2.1 above Anthropic's latest, Charxiv Reasoning above Claude Mythos Preview — but lags on Humanity's Last Exam (HLE). Sakana's own framing is conservative: "shoulder-to-shoulder with Fable 5 and Mythos Preview," not blanket dominance. Pricing: Fugu Ultra at $5/M input ($10/M >272K) and $30/M output ($45/M >272K), plus subscriptions at Standard $20 / Pro $100 / Max $200 per month (both Fugu and Fugu Ultra). Enterprise is usage-based. OpenAI-compatible API at console.sakana.ai. Not available in the EU/EEA pending GDPR compliance; Japan-region usage works. The strategic point is structural resilience, not raw performance — escape from single-vendor dependence and diversification against export-control risk (directly continuing our Sakana Marlin column's Fable 5 export-restriction thread). BuildFastWithAI calls it 'the orchestration model that routes around export controls,' and Clanker Cloud frames it as 'Model Orchestration Is Becoming the Product.' Fugu's own parameter count, Japanese-specific benchmark scores (ELYZA / JMMLU / JMT-Bench), and individual statements from David Ha / Llion Jones are not yet confirmed, leaving 'thin wrapper over external APIs' criticism and independent verification as open questions.


TL;DR — Sakana Fugu in One Sentence

Sakana AI officially launched Sakana Fugu on June 22, 2026 (fugu-release / product page).

The single most important point: Fugu is not a next-generation Japanese LLM — it is an LLM trained to call other LLMs. A conductor-style model that dynamically orchestrates frontier models inside its loop.

Four points:

1. A new category: the orchestration model — Fugu picks, dispatches, verifies, and integrates from an agent pool of GPT-5.5 / Claude Opus 4.8 / Gemini 3.1 Pro and others 2. Academic basis: two ICLR 2026 papers — TRINITY (evolutionarily optimized LLM coordinator) + Conductor (RL-discovered coordination strategies) 3. Two variants — everyday-use Fugu and hardest-problem Fugu Ultra 4. The value prop is structural resilience, not raw performance — escape from single-vendor dependence and export-control risk

This column sits next to our Sakana Marlin coverage, Claude Fable 5 export-control suspension column, and same-day PLaMo 3.0 Prime piece as the June 22, 2026 "Sakana AI × Japanese AI front" cluster.

Release Overview — Fugu as Conductor

ItemValue
GA dateJune 22, 2026
DistributionOpenAI-compatible API + subscriptions (console.sakana.ai)
VariantsFugu (everyday, low-latency) / Fugu Ultra (hardest problems, deep coordination)
Academic basisICLR 2026 papers TRINITY and Conductor
Relationship to Sakana MarlinSeparate line. Marlin (released early June 2026) is an autonomous research agent; Fugu is the conductor. Future plans hint at calling Marlin-style agents from inside Fugu's pool.

Naming: Neither Sakana's official posts nor gihyo / GIGAZINE explicitly explain why "Fugu" (pufferfish). What's verifiable is that it fits the company's ongoing "Japanese fish names" theme (Sakana → Marlin → Fugu).

Architecture — An LLM Trained to Call LLMs

Fugu is "an LLM trained to call other LLMs". On a query, Fugu either:

- Answers directly when it can (recursive self-call permitted) - For complex multi-step work, selects / dispatches / verifies / integrates results from external LLMs

The agent pool includes (per gihyo): GPT-5.5 (OpenAI), Claude Opus 4.8 (Anthropic), Gemini 3.1 Pro (Google DeepMind), plus other frontier closed and open models.

Users can exclude specific providers / models from the pool for compliance (e.g., exclude Chinese models, exclude models incompatible with EU data transfer). But Fugu Ultra has a fixed pool that cannot be excluded — that's the cost of maximum coordinated performance.

[Loop Engineering](../columns/loop-engineering-ai-agent-paradigm-2026-06) lens: Fugu acts as the Outer-Loop orchestrator; the Inner Loop is run by the external LLMs. Maker / Checker separation (generation and verification on different models) is built into the product — a productized implementation of the Loop Engineering Maker-Checker pattern.

Academic Basis — TRINITY and Conductor (ICLR 2026)

TRINITY (ICLR 2026): an evolutionarily optimized LLM coordinator that dynamically assigns Thinker / Worker / Verifier roles across multiple LLMs to span coding, math, reasoning, and knowledge tasks.

Conductor (ICLR 2026): uses RL to discover natural-language coordination strategies (inter-agent communication patterns and bespoke prompts). Continues Sakana's Evolutionary Model Merge / DiscoPOP line of work.

Fugu's own parameter count: undisclosed. By design — the bet is on coordination performance, not single-model size.

Context length: pricing has a distinct band above 272K tokens, indicating >272K support.

Benchmarks — Shoulder-to-Shoulder, Not Blanket Win

BenchmarkFugu UltraComparisonSource
SWE-Bench Pro73.7Reported to beat Claude Opus 4.8 et alXenoSpectrum
Terminal-Bench 2.1Above Anthropic's latestCodingSBBit
Charxiv ReasoningAbove Claude Mythos PreviewComplex-chart reasoningSBBit
Humanity's Last Exam (HLE)LagsBroad academic knowledgeSBBit

Sakana's own framing is conservative: "shoulder-to-shoulder with Fable 5 and Mythos Preview" — frontier-class parity with selective wins, not blanket dominance.

Japanese-specific benchmarks (ELYZA-tasks-100 / JMMLU / Japanese MT-Bench) are not published by official, gihyo, GIGAZINE, or SBBit. Fugu is a general-purpose coordinator, not a Japanese-language specialist. For Japanese-language strength, compare against PLaMo 3.0 Prime or Liquid AI LFM2.5-J.

Distribution and Pricing

Distribution:

- Closed weights, API only — no Hugging Face open-weights drop confirmed - OpenAI-compatible endpoint — drops into Claude Code–compatible clients, Cursor, Aider, etc. - Console: console.sakana.ai

Fugu Ultra token pricing:

ItemStandard>272K
Input$5 / M$10 / M
Output$30 / M$45 / M

Subscriptions (individual):

- Standard $20 / month - Pro $100 / month - Max $200 / month

All three include both Fugu and Fugu Ultra. The cost story: for $20–$200, you get effectively bundled access to GPT-5.5 / Opus 4.8 / Gemini 3.1 Pro through the orchestrator — same tier as Claude Pro / Max but covering multiple providers.

Enterprise: usage-based (model usage + agent count).

Geographic restriction: not available in the EU / EEA while GDPR compliance is in progress. Japan-region usage is supported.

Strategic Value — Structural Resilience over Performance

The most important strategic point about Fugu is that its core value proposition is structural resilience, not raw performance.

Export controls and geopolitical risk: official messaging, XenoSpectrum, and BuildFastWithAI all emphasize escape from single-vendor dependence and export-control risk diversification. The Claude Fable 5 / Mythos 5 sudden suspension under a US government export directive in May 2026 turned single-vendor risk from theory into a documented precedent. Fugu is designed with that precedent as direct motivation.

Market positioning: stepping out of the "single strongest model" race and competing one layer up at the meta-model layer. Clanker Cloud frames it as "Model Orchestration Is Becoming the Product."

David Ha / Llion Jones quotes: signed comments exist in the official release; no extractable quoted statements were located in the gihyo / GIGAZINE / SBBit / XenoSpectrum trace. Personal X accounts (@hardmaru, @lliondj) need follow-up.

GENIAC / NEDO / METI / SoftBank funding: not mentioned in this release. No update on the $1.5B SoftBank round in the Fugu materials.

Risks and Reservations

1. "Thin wrapper over external APIs" critique room — end-state cost likely depends on GPT-5.5 / Opus 4.8 / Gemini 3.1 Pro token billing. Whether $20 / month genuinely buys Opus-4.8-class output at scale depends on rate-limit and fair-use policy that GA hasn't yet stress-tested in public. 2. Independent benchmark verification — at this stage, only ClassMethod's hands-on early review exists; broader third-party replication is pending. 3. HLE lag — broad academic knowledge tasks remain below frontier models; the strong-vs-weak split is clean. 4. Evolutionary-merge reproducibility debate — academic skepticism toward Evolutionary Model Merge carries over; the "evolved" portion of TRINITY is open to scrutiny. 5. EU / EEA unavailable — pending GDPR work. 6. No Japanese-language benchmarks — expected given the positioning, but it weakens the "domestic LLM" pitch.

Recommended Adoption Pattern (Oflight)

What we recommend in our AI consulting and software development practice — buying structural resilience, not raw quality:

Use case 1 — single gateway for multi-vendor strategy: for organizations already mixing several frontier vendors, consolidate contracts, billing, and observability under Fugu. The OpenAI-compatible API layer absorbs export-control / API-key-leak / lock-in risk.

Use case 2 — productized Loop Engineering Maker-Checker: skip building your own Maker-Checker loop. Thinker / Worker / Verifier are already in the box per the TRINITY paper.

Use case 3 — coding-heavy workloads with Fugu Ultra: SWE-Bench Pro 73.7 / Terminal-Bench 2.1 are the claimed-to-beat-Opus territory. Validate on your own project with a real PoC.

Avoid: broad academic-knowledge work (HLE lag), EU-resident orgs (no availability), and sensitive Japan-domestic workloads (cross-border data scrutiny against the agent pool needs the same diligence as for PLaMo 3.0 Prime).

FAQ

Q1. Is Fugu a Japanese-specialized LLM? A. No. General-purpose orchestrator. For Japanese strength, look at PLaMo 3.0 Prime or Liquid AI LFM2.5-J. Q2. What does "an LLM trained to call other LLMs" mean? A. Fugu is itself an LLM, but its training objective is deciding how to call external LLMs to solve a problem, not directly producing the answer. TRINITY provides the academic mechanism (dynamic Thinker / Worker / Verifier role assignment). Q3. Is $20 / month really Opus-4.8-class output? A. Structurally possible because Opus 4.8 is in the pool. In practice, rate-limit and fair-use policy will gate how often Fugu actually escalates to Opus 4.8 — that's exactly what needs measuring in the first weeks post-GA. Q4. Can specific models be excluded from the pool? A. In Fugu (regular): yes — compliance-driven exclusions are supported. In Fugu Ultra: no — fixed pool for maximum coordinated performance. Q5. EU availability? A. No — GDPR work in progress. Japan, US, and APAC are supported. Q6. Relationship to the Claude Fable 5 export suspension? A. Direct precedent and motivation. The Fable 5 / Mythos 5 suspension turned single-vendor risk into documented reality. Fugu is engineered to fall over to GPT / Gemini automatically if Claude becomes unavailable — that resilience is the pitch. Q7. Relationship to Sakana Marlin? A. Different product line. Marlin is an autonomous research agent; Fugu is the conductor. They are complementary — future plans hint at putting Marlin-style agents inside Fugu's pool. Q8. Relationship to PLaMo 3.0 Prime (same-day release)? A. Different concepts. PLaMo is a single Japanese LLM; Fugu is a multi-LLM orchestrator. Not direct competitors — PLaMo could plausibly be added to Fugu's pool. The same-day release on June 22, 2026 appears to be coincidental.

Bottom Line

Sakana Fugu is the leading example of a new AI category — the orchestration model — released June 22, 2026. Its design thesis is "an LLM trained to call other LLMs," academically grounded in the ICLR 2026 TRINITY and Conductor papers, dispatching internally across GPT-5.5 / Claude Opus 4.8 / Gemini 3.1 Pro. SWE-Bench Pro 73.7 is the headline; HLE lag is the honest counterweight; Sakana's own framing — "shoulder-to-shoulder with Fable 5 and Mythos Preview" — is the realistic register.

The single most important strategic point is that Fugu's real value is structural resilience, not raw quality. The Claude Fable 5 / Mythos 5 export-control suspension of May 2026 is the precedent driving the design — diversify away from single-vendor / single-jurisdiction risk. Clanker Cloud captures it well: "Model Orchestration Is Becoming the Product."

Practical adoption paths for Japanese enterprises: (1) a single gateway over an existing multi-vendor stack, (2) productized [Loop Engineering](../columns/loop-engineering-ai-agent-paradigm-2026-06) Maker-Checker, and (3) Fugu Ultra PoC on coding-heavy projects. Reservations: the "thin API wrapper" critique, rate-limit unknowns, HLE lag, and EU non-availability all remain — measurement on your own workload is non-negotiable.

References

Primary: - Sakana AI — fugu-release - Sakana AI — product page - console.sakana.ai - Sakana AI - Sakana AI Blog - Hugging Face SakanaAI - David Ha X / Llion Jones X Third-party: - gihyo.jp - GIGAZINE - SBBit - XenoSpectrum - Dealroom - ClassMethod DevelopersIO - StartupHub.ai - BuildFastWithAI - Clanker Cloud - talentcloud.jp Related: - PLaMo 3.0 Prime (same-day release) - Sakana Marlin - Claude Fable 5 export-control suspension - Loop Engineering - Kimi K2.7-Code - Liquid AI Japanese-specialized models Note: Fugu's own parameter count, Japanese benchmark scores (ELYZA / JMMLU / JMT-Bench), specific quoted statements from David Ha / Llion Jones / Yi Tay, GENIAC / NEDO ties, and any new funding-round details are not confirmable as of June 22, 2026. Re-check Sakana AI's official blog before any production decision.

Feel free to contact us

Contact Us