Skip to main content

株式会社オブライト

Column

Useful articles about SEO, Web Development, and IT

344 articles

Sakana Fugu Deep Dive — The June 22, 2026 'LLM Trained to Call Other LLMs' from Sakana AI: Dynamic Orchestration Across GPT-5.5 / Claude Opus 4.8 / Gemini 3.1 Pro, Powered by the ICLR 2026 TRINITY / Conductor Papers, Claiming 73.7 on SWE-Bench Pro (Beating Opus 4.8), Shipping as Fugu / Fugu Ultra with $20 / $100 / $200 Subscription Tiers — EU/EEA Excluded Pending GDPR Compliance

Sakana AI officially launched Sakana Fugu on June 22, 2026 (fugu-release / product page / gihyo.jp / GIGAZINE). Critically, this is not a next-generation Japanese LLM — it is an LLM trained to call other LLMs, a 'conductor' model that dynamically orchestrates frontier models inside the loop. When you send a query, Fugu itself either (1) answers directly when it can, or (2) for complex multi-step tasks selects, dispatches, verifies, and integrates from an agent pool that includes GPT-5.5, Claude Opus 4.8, Gemini 3.1 Pro and others. Academic basis: two ICLR 2026 papers — TRINITY (an evolutionarily optimized LLM coordinator that dynamically assigns Thinker / Worker / Verifier roles) and Conductor (RL-discovered coordination strategies expressed in natural language). Two variants: Fugu (everyday tasks, low latency) and Fugu Ultra (hardest problems, deep coordination — pool composition is fixed and cannot be excluded). Benchmarks: SWE-Bench Pro 73.7 (reported to beat Claude Opus 4.8, per XenoSpectrum), Terminal-Bench 2.1 above Anthropic's latest, Charxiv Reasoning above Claude Mythos Preview — but lags on Humanity's Last Exam (HLE). Sakana's own framing is conservative: "shoulder-to-shoulder with Fable 5 and Mythos Preview," not blanket dominance. Pricing: Fugu Ultra at $5/M input ($10/M >272K) and $30/M output ($45/M >272K), plus subscriptions at Standard $20 / Pro $100 / Max $200 per month (both Fugu and Fugu Ultra). Enterprise is usage-based. OpenAI-compatible API at console.sakana.ai. Not available in the EU/EEA pending GDPR compliance; Japan-region usage works. The strategic point is structural resilience, not raw performance — escape from single-vendor dependence and diversification against export-control risk (directly continuing our Sakana Marlin column's Fable 5 export-restriction thread). BuildFastWithAI calls it 'the orchestration model that routes around export controls,' and Clanker Cloud frames it as 'Model Orchestration Is Becoming the Product.' Fugu's own parameter count, Japanese-specific benchmark scores (ELYZA / JMMLU / JMT-Bench), and individual statements from David Ha / Llion Jones are not yet confirmed, leaving 'thin wrapper over external APIs' criticism and independent verification as open questions.

Sakana AISakana FuguMulti-Agent Orchestration+4

[Update 2026-06-16: Paused] Anthropic Pauses the June 15 Claude Agent SDK Credit Pool Split — Official Help Center Notice Reverts Behavior to Subscription Usage Limits, Previously Announced $20 / $100 / $200 Monthly Credits No Longer Available

June 16, 2026 Update: On the very day of enforcement (June 15, 2026), Anthropic paused the planned split of Claude Agent SDK, claude -p, GitHub Actions, and third-party app (OpenClaw, Zed, Conductor, etc.) usage from subscription rate limits. The official Help Center article was amended with: "Update June 15: We are pausing the changes to Claude Agent SDK usage described below. For now, nothing has changed: Claude Agent SDK, claude -p, and third-party app usage still draw from your subscription is usage limits. The previously announced monthly credit, which would have been available to eligible claimants in connection with these changes, isn it available. We are working to update the plan to better support how users build with Claude subscriptions. When we have an update, we will share it before anything takes effect." The previously announced monthly credits (Pro $20 / Max 5x $100 / Max 20x $200 / Team $20-100 / Enterprise $200) were not distributed. Programmatic usage now once again draws from standard subscription limits. The change is officially a pause, not a full rollback — Anthropic says it is reworking the plan and will share details before anything new ships. The backlash that triggered this was substantial: community estimates projected effective price hikes of 12-175x against API-rate equivalents, Anthropic engineer Lydia Hallie was quickly Community-Noted on X, and Reddit r/ClaudeAI, HN, and The New Stack all carried critical coverage. This is Anthropic is third subscription-policy reversal of 2026 (January OAuth block reversed within days, April 4 third-party agent ban reversed within 24 hours, and now the May 14 compromise credit pool paused on its June 15 enforcement day). This column preserves the original announced design while adding a detailed reversal section: timeline, operational implications, and the current validity of the "turn Extra Usage auto-billing off" guidance.

AnthropicClaudeClaude Code+4

Sakana AI Marlin Deep Dive — Japan's 'Virtual CSO' Ultra Deep Research Agent Explained

Sakana AI's first commercial product 'Marlin,' launched June 15, 2026, is an autonomous research agent — not an LLM. Combining AB-MCTS (Adaptive Branching Monte Carlo Tree Search) with multi-LLM collaboration across OpenAI o4-mini, Google Gemini 2.5 Pro, and DeepSeek R1-0528, Marlin operates autonomously for up to ~8 hours per task to generate tens-to-100+ page reports and executive slides. Designed for financial institutions, corporate planning, consulting, and think tanks, it differs fundamentally from OpenAI Deep Research and Gemini Deep Research in both purpose and architecture. This guide covers everything from its technical design to pricing, competitor comparison, and what it means for Japanese enterprises.

Sakana AIMarlinUltra Deep Research+4

Claude Fable 5 and Mythos 5 Suspended Under US Export Control Directive — Forced Recall Just 3 Days After Launch

On June 12, 2026 at 17:21 ET, Anthropic received an export control directive from the US Department of Commerce Bureau of Industry and Security (BIS) and immediately suspended Claude Fable 5 and Mythos 5 for all customers. Issued just three days after the models' release, this marks what multiple outlets describe as the first publicly known instance of direct US federal government intervention in a commercially deployed frontier AI model. This column covers the legal nature of the directive, the government's rationale and Anthropic's rebuttal, impact scope across API, Bedrock, and Vertex, alternative model options, and practical implications for Japanese enterprises.

AnthropicClaude Fable 5Claude Mythos 5+4

Kimi K2.7-Code Deep Dive — Moonshot AI's June 12, 2026 Coding-Specialized 1T MoE Open-Weights Model, Modified MIT License, $0.95/$4.00 per 1M, 256K Context — But Japanese Enterprises Face Two Critical Caveats (Cross-Border Data and Unverified Benchmarks)

A primary-source deep dive on Kimi K2.7-Code, released June 12, 2026 by Moonshot AI (Beijing). Grounded in the Hugging Face model card, MarkTechPost, and VentureBeat's skepticism piece. Covers the 1T-total / 32B-active MoE architecture (384 experts, 8 routed + 1 shared), 256K context, MoonViT ~400M vision encoder, native INT4, forced-on thinking mode. License is Modified MIT (attribution required only above 100M MAU or $20M MRR), API pricing is $0.95 input / $0.19 cache-hit / $4.00 output per 1M tokens — roughly 1/18 of Claude Opus 4.8's output price. OpenAI + Anthropic-compatible endpoints drop straight into Claude Code / Cursor / Aider / Cline / cmux. Moonshot self-reports +21.8% vs K2.6 on its own Kimi Code Bench v2 and -30% reasoning tokens, but all public benchmarks are Moonshot's own proprietary suites; independent SWE-bench Verified / Pro / FrontierCode scores are not yet available as of June 15, 2026 (VentureBeat). For Japanese enterprises the column flags two critical caveats: (1) both api.moonshot.cn and the Singapore-subsidiary-run api.moonshot.ai remain exposed to PRC National Intelligence Law Article 7 compelled disclosure (set against Japan's PPC DeepSeek alert of February 3, 2025 and the Digital Agency notice of February 6, 2025), and (2) the only reliable mitigation is Hugging Face self-hosting (~4-8 H100, ~595GB INT4) following the Mizuho / Lion Qwen-on-domestic-infrastructure precedent.

Moonshot AIKimi K2.7-CodeOpen Weight LLM+4

DiffusionGemma Deep Dive — Google DeepMind's June 10, 2026 Open-Weight Text-Diffusion LLM, Same Backbone as Gemma 4 26B (A4B MoE), Up to 4× Faster Than AR Counterparts, Apache 2.0, With an Honest "Quality Trails AR" Disclosure

A primary-source deep dive on DiffusionGemma (google/diffusiongemma-26B-A4B-it, 25.2B total / 3.8B active MoE), released June 10, 2026 by Google DeepMind in coordination with NVIDIA. Grounded in the official Google blog, ai.google.dev model card, Hugging Face card, and NVIDIA's blog. Where autoregressive (AR) models generate one token at a time left-to-right, diffusion language models (DLMs) denoise a 256-token canvas in parallel into final text. 15-20 tokens commit per forward pass, up to 48 denoising steps, 1,000+ tok/sec on H100, 700+ on RTX 5090, ~3.5–4× the throughput of the AR Gemma 4 counterpart. Crucially, Google openly states that quality lags AR: MMLU Pro 77.6 vs 82.6, GPQA 73.2 vs 82.3, MMMU Pro 54.3 vs 73.8. Apache 2.0, distributed via Hugging Face / Vertex AI / NVIDIA NIM — the first large-scale open-weight diffusion LLM in the industry. The column covers practical implications for Japanese enterprises (on-prem internal agents, code editing, low-latency workflows) and positioning against Mercury (Inception Labs), LLaDA, and Gemini Diffusion.

Google DeepMindGemma 4DiffusionGemma+5

Cognition AI's FrontierCode Explained: The Next-Gen Coding AI Benchmark That Asks 'Is It Mergeable?'

On June 8, 2026, Cognition AI unveiled FrontierCode — not a product, but a coding AI evaluation benchmark. It measures not just 'does it pass tests' but 'would an OSS maintainer actually merge this?' across six axes. This article covers its differences from SWE-bench Verified, the three-tier dataset (Diamond/Main/Extended), official results with Claude Opus 4.8 leading at 13.4% on Diamond, and its relevance to Japan's rigorous code-review culture.

Cognition AIFrontierCodeSWE-bench+4

Apple AFM Core Advanced Deep Dive — How 20B Sparse MoE Brings Frontier AI to iPhone

AFM Core Advanced, the flagship of Apple's third-generation Foundation Models announced at WWDC 2026, packs a 20B-parameter Sparse MoE with Apple's proprietary IFP technology — enabling frontier-class on-device inference on iPhone 17 Pro. This deep dive covers architectural innovations, A19 Pro specs, device requirements, and the 'fully Apple designed' controversy around Gemini distillation.

AppleAFMApple Foundation Models+5

Google NotebookLM 'Better Research' Update: Full Breakdown — Gemini 3.5 × Antigravity Turns It Into an Active Research Agent

On June 8, 2026, Google announced a major update to NotebookLM called 'Better Research.' The AI engine was upgraded to Gemini 3.5, and the new-generation coding agent framework Antigravity was integrated. Each workspace now gets a secure VM for code execution and diverse file output. The new agentic research feature lets NotebookLM autonomously discover and analyze primary sources from the web — no pre-uploaded sources required. This article covers the official benchmarks (65%+ win rate, 4x faster than rival LLMs), pricing tiers, competitive landscape, and practical use cases for Japanese enterprises.

NotebookLMGoogleGemini 3.5+3

Software Development2026-06-10

Complete Guide to Obsidian 2026 — Turn Knowledge into Assets with Local-First PKM

Obsidian is a local-first knowledge management tool developed by Canada-based Dynalist Inc. Since commercial use became fully free in February 2025, it has become the top PKM choice for Japanese business professionals — IT consultants, engineers, researchers, and legal practitioners alike. This guide covers everything from basic Vault setup and the new 2026 Bases feature to local LLM integration via Ollama, Zettelkasten and PARA workflows, and Japanese-language caveats, so you can start today.

ObsidianPKMKnowledge Management+5

Claude Fable 5 and Claude Mythos 5 Deep Dive — Anthropic's New "Mythos-Class" Top Tier Announced June 9, 2026, at $10/$50 per 1M Tokens, With Auto-Fallback to Opus 4.8 and Project Glasswing Gating

A grounded read of Claude Fable 5 and Claude Mythos 5, officially announced by Anthropic on June 9, 2026 (release post, Fable, Mythos). The two share an identical base model — Fable ships publicly with safeguards, Mythos is restricted to vetted partners. The release introduces a new top tier above Opus / Sonnet / Haiku — the "Mythos class". Pricing is $10 input / $50 output per 1M tokens (under half of Mythos Preview); cybersecurity, bio-chemical, and distillation-attempt queries are detected and auto-routed to Claude Opus 4.8 (firing on under 5% of sessions); Hex's core analysis benchmark hit 90% for the first time ever; and Mythos 5 itself is gated to Project Glasswing-vetted cyber defenders and biomedical researchers.

AnthropicClaudeClaude Fable 5+4

cmux Deep Dive: Manaflow's macOS-Native Terminal for Running AI Agents in Parallel

cmux by Manaflow (YC S24) is a macOS-native terminal built specifically for running multiple AI agents in parallel. Powered by Ghostty's libghostty in Swift/AppKit, it offers vertical tabs, notification rings, an embedded browser, and a socket API. This column covers its features, install, use cases, competitive landscape, and considerations for enterprise adoption in Japan.

cmuxManaflowAI Agent+5

123 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29