株式会社オブライト

Articles tagged "Sakana Fugu"

1 article

Sakana Fugu Deep Dive — The June 22, 2026 'LLM Trained to Call Other LLMs' from Sakana AI: Dynamic Orchestration Across GPT-5.5 / Claude Opus 4.8 / Gemini 3.1 Pro, Powered by the ICLR 2026 TRINITY / Conductor Papers, Claiming 73.7 on SWE-Bench Pro (Beating Opus 4.8), Shipping as Fugu / Fugu Ultra with $20 / $100 / $200 Subscription Tiers — EU/EEA Excluded Pending GDPR Compliance

**Sakana AI officially launched Sakana Fugu on June 22, 2026** ([fugu-release](https://sakana.ai/fugu-release/) / [product page](https://sakana.ai/fugu/) / [gihyo.jp](https://gihyo.jp/article/2026/06/sakana-fugu) / [GIGAZINE](https://gigazine.net/gsc_news/en/20260622-sakana-fugu-multi-agent-system-ai)). Critically, this is **not a next-generation Japanese LLM — it is an LLM trained to call other LLMs**, a 'conductor' model that dynamically orchestrates frontier models inside the loop. When you send a query, Fugu itself either (1) answers directly when it can, or (2) for complex multi-step tasks **selects, dispatches, verifies, and integrates** from an agent pool that includes **GPT-5.5, Claude Opus 4.8, Gemini 3.1 Pro** and others. **Academic basis**: two ICLR 2026 papers — **TRINITY** (an evolutionarily optimized LLM coordinator that dynamically assigns Thinker / Worker / Verifier roles) and **Conductor** (RL-discovered coordination strategies expressed in natural language). **Two variants**: **Fugu** (everyday tasks, low latency) and **Fugu Ultra** (hardest problems, deep coordination — pool composition is fixed and cannot be excluded). **Benchmarks**: **SWE-Bench Pro 73.7** (reported to beat Claude Opus 4.8, per XenoSpectrum), Terminal-Bench 2.1 above Anthropic's latest, Charxiv Reasoning above Claude Mythos Preview — but **lags on Humanity's Last Exam (HLE)**. Sakana's own framing is conservative: "shoulder-to-shoulder with Fable 5 and Mythos Preview," not blanket dominance. **Pricing**: Fugu Ultra at **$5/M input ($10/M >272K) and $30/M output ($45/M >272K)**, plus **subscriptions at Standard $20 / Pro $100 / Max $200 per month** (both Fugu and Fugu Ultra). Enterprise is usage-based. **OpenAI-compatible API** at console.sakana.ai. **Not available in the EU/EEA** pending GDPR compliance; Japan-region usage works. **The strategic point is structural resilience, not raw performance** — escape from single-vendor dependence and diversification against export-control risk (directly continuing our Sakana Marlin column's Fable 5 export-restriction thread). BuildFastWithAI calls it 'the orchestration model that routes around export controls,' and Clanker Cloud frames it as 'Model Orchestration Is Becoming the Product.' **Fugu's own parameter count, Japanese-specific benchmark scores (ELYZA / JMMLU / JMT-Bench), and individual statements from David Ha / Llion Jones are not yet confirmed**, leaving 'thin wrapper over external APIs' criticism and independent verification as open questions.

Sakana AISakana FuguMulti-Agent Orchestration