<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Oflight Inc. Column</title>
    <link>https://www.oflight.co.jp/en/columns</link>
    <description>Oflight Inc. is a Tokyo-based IT company providing software development, digital signage, web development, network systems, and IoT/AR/VR technology services.</description>
    <language>en</language>
    <lastBuildDate>Wed, 01 Jul 2026 23:25:17 GMT</lastBuildDate>
    <atom:link href="https://www.oflight.co.jp/feed.en.xml" rel="self" type="application/rss+xml"/>
    <item>
      <title><![CDATA[Claude Fable 5 Returns on July 1, 2026 After a 19-Day Shutdown — U.S. Commerce Department Lifts Export Control
Amazon's Jailbreak Research Was the Trigger; Anthropic Applied Mitigations Before Resuming
Cursor Restored Fable 5 the Same Evening (Leads CursorBench, Most Expensive Per Task); Devin Fusion + Fable 5 Delivers 41% Cost Reduction vs Fable 5 Alone]]></title>
      <link>https://www.oflight.co.jp/en/columns/claude-fable-5-return-devin-cursor-2026-07</link>
      <guid isPermaLink="true">https://www.oflight.co.jp/en/columns/claude-fable-5-return-devin-cursor-2026-07</guid>
      <description><![CDATA[**Anthropic's Claude Fable 5 and Mythos 5 returned globally on July 1, 2026** after a 19-day shutdown ([CNBC](https://www.cnbc.com/2026/06/30/anthropic-says-trump-admin-has-lifted-export-controls-on-claude-fable-5-and-mythos-5.html) / [9to5Mac](https://9to5mac.com/2026/07/01/claude-fable-5-cleared-to-return-as-us-lifts-anthropics-export-control-restriction/) / [VentureBeat](https://venturebeat.com/technology/anthropic-is-bringing-back-claude-fable-5-globally-after-us-lifts-export-control-order-where-can-enterprises-access-it) / [The Hacker News](https://thehackernews.com/2026/07/anthropic-restores-claude-fable-5-after.html)).

**Timeline**: on **June 12 at 5:21 PM ET the U.S. government issued an export-control directive** (directly triggered by [Amazon researchers reporting a jailbreak](../columns/claude-fable-5-export-control-suspension-2026-06)) → Anthropic immediately **shut down Fable 5 and Mythos 5 worldwide** (it had no way to verify user nationality in real time) → **June 26: Mythos 5 was cleared for 100+ U.S. institutions** → **June 30: the Commerce Department lifted controls** → **July 1: global restoration** across Claude Platform / Claude.ai / Claude Code / Claude Cowork / AWS Bedrock / Google Vertex AI / Microsoft Foundry.

**Anthropic's mitigations**: the Amazon-reported jailpath was addressed, commitments to "proactively detect and address security risks," collaboration with government on future release protocols, and a reporting channel for observed malicious use.

**Fable 5 pricing**: **$10 / M input, $50 / M output** (per apidog). Pro / Max / Team / Enterprise subscribers get an extra allocation up to **50% of their weekly usage limits through July 7**, then usage-credit billing.

**Cursor integration**: Cursor officially announced Fable 5 was back at **7:48 PM PT on July 1** on X. "Leads all models on CursorBench — but the most expensive per task," per Cursor ([Cursor Community Forum](https://forum.cursor.com/t/will-fable-5-be-returning-in-cursor/164531) / [apidog setup guide](https://apidog.com/blog/claude-fable-5-cursor/)).

**Devin Fusion**: before the shutdown, **Devin Fusion + Fable 5 achieved a 41% cost reduction over Fable 5 alone** using a parallel two-agent architecture, delivering frontier-level performance at **~35% lower cost** (The New Stack).

**Enterprise lessons from the 19-day outage**: financial services, healthcare, SaaS, and critical-infrastructure firms lost access to production-embedded tools without warning. The phrase **"frontier access is now conditional infrastructure"** entered the discourse. **Cloud-platform diversification (AWS + GCP + Azure) offers no protection when the model itself is restricted**. OpenAI reportedly **expanded market share** during Anthropic's outage through pre-clearance dealings with regulators (MarketScale).

**Strategic context**: [Anthropic released Sonnet 5 on June 30](../columns/claude-sonnet-5-anthropic-release-2026-06-30) — the same day export controls lifted. The timing looks intentional: **Sonnet 5 is positioned as a fallback to Fable 5**, and heading into its IPO, Anthropic now has a clear **two-tier stack** — frontier Fable 5 for peak capability + mid-tier Sonnet 5 for cost-efficient volume.]]></description>
      <pubDate>Thu, 02 Jul 2026 00:00:00 GMT</pubDate>
      <category>AI</category>
    </item>
    <item>
      <title><![CDATA[What Is Cursor for iOS? Control Coding Agents from iPhone and iPad (Composer 2.5, iOS 26+, Public Beta from June 29, 2026)]]></title>
      <link>https://www.oflight.co.jp/en/columns/cursor-ios-mobile-coding-agent-2026-06</link>
      <guid isPermaLink="true">https://www.oflight.co.jp/en/columns/cursor-ios-mobile-coding-agent-2026-06</guid>
      <description><![CDATA[**[Cursor](https://cursor.com/) (by Anysphere) released its official iOS public beta on June 29, 2026** ([official blog](https://cursor.com/blog/ios-mobile-app) / [Changelog](https://cursor.com/changelog/ios-mobile-app) / [App Store](https://apps.apple.com/app/cursor/id6767085653) / [TechCrunch](https://techcrunch.com/2026/06/29/cursor-now-has-a-mobile-app-for-guiding-your-coding-agent-on-the-go/) / [9to5Mac](https://9to5mac.com/2026/06/29/cursor-releases-iphone-and-ipad-app-following-recent-acquisition-by-spacex/)).

**You can now drive Cursor's coding agents directly from iPhone and iPad** (iOS 26.0+ required). Available to all paid-plan subscribers as a public beta, with **75% off Composer 2.5 runs through July 5, 2026**.

**Main capabilities**:
- **Launch cloud / background agents** or **remote-control desktop-running agents** (with handoff between the two)
- **Voice input for task description**, **slash commands** to guide the agent
- **Push notifications + lock-screen Live Activities** when work is ready
- **Review generated demos, screenshots, logs, and videos**; annotate screenshots
- **Inspect diffs and merge PRs** from the phone
- **Model selection** across Composer 2.5, GPT (OpenAI), Claude (Anthropic), Gemini (Google), etc.
- **MCP integrations** — query Datadog logs, summarize Slack activity, etc.

**Strategic context**: this lands alongside the **$60B SpaceX acquisition option on Cursor** (referenced in our [Grok Build column](../columns/grok-build-xai-cli-coding-agent-2026-06)) and operationalizes the **"independent coding agents" thesis** of Cursor 2.0 (Oct 2025). Anthropic's Boris Cherny (Claude Code lead) is quoted saying **"I do most of my coding on my phone now"** (TechCrunch). Joins **Claude Code Mobile / Codex Mobile / GitHub Copilot Mobile** as the fourth major mobile coding-agent surface.

**Caveats**: public beta only, the Composer 2.5 promo ends July 5, 2026, no Android or Web yet, requires iOS 26.0+ (older iPhones cannot install), and large-codebase manipulation from mobile alone is still constrained.]]></description>
      <pubDate>Wed, 01 Jul 2026 00:00:00 GMT</pubDate>
      <category>AI</category>
    </item>
    <item>
      <title><![CDATA[Claude Sonnet 5 Deep Dive — Anthropic's June 30, 2026 Release Hits 92.4% on SWE-Bench Verified (+12pt Over Opus 4.6)
1M-Token Context, 88.3% on OSWorld-Verified (Beats the 72.4% Human Expert Baseline), 96.2% on GPQA Diamond, 84.7% on ARC-AGI-2
Introductory $2 / $10 per M Tokens Through August 31, 2026 → Standard $3 / $15, Now the Default in Claude Free / Pro and Claude Code Pro]]></title>
      <link>https://www.oflight.co.jp/en/columns/claude-sonnet-5-anthropic-release-2026-06-30</link>
      <guid isPermaLink="true">https://www.oflight.co.jp/en/columns/claude-sonnet-5-anthropic-release-2026-06-30</guid>
      <description><![CDATA[**Anthropic released Claude Sonnet 5 on June 30, 2026** ([official release](https://www.anthropic.com/news/claude-sonnet-5) / [System Card](https://www.anthropic.com/claude-sonnet-5-system-card) / [TechCrunch](https://techcrunch.com/2026/06/30/anthropic-launches-claude-sonnet-5-as-a-cheaper-way-to-run-agents/) / [VentureBeat](https://venturebeat.com/technology/anthropic-launches-claude-sonnet-5-at-a-steep-discount-to-its-top-model-as-the-company-races-toward-a-blockbuster-ipo)).

**The headline: the mid-tier Sonnet just leapfrogged Opus 4.6 by 12 points** — **92.4% on SWE-Bench Verified** (Opus 4.6 was 80.8%), **88.3% on OSWorld-Verified** (15.9 pts ahead of the 72.4% human-expert baseline), **96.2% on GPQA Diamond** (over [Gemini 3.1 Pro's](../columns/local-llm-landscape-2026-june-update) 94.3%), and **84.7% on ARC-AGI-2** (7.6 pts ahead of Gemini 3.1 Pro's 77.1%). It ships with a **1M-token context window** (matching Opus 4.8) and a 128K max output.

**Strategic pricing on the eve of Anthropic's IPO**: **introductory $2 / M input and $10 / M output through August 31, 2026**, after which standard pricing becomes **$3 / $15** (matching [Sonnet 4.6](../columns/claude-agent-sdk-credit-billing-change-2026-06-15)). Note: **a new tokenizer maps the same input to about 1.0–1.35× more tokens**. It undercuts GPT-5.5, Gemini 3.1 Pro, and Anthropic's own Opus 4.8 on price.

**Default-model rollout**: now the **default in claude.ai Free and Pro**, **default in Claude Code Pro**, and available via API (`claude-sonnet-5`), AWS Bedrock, Vertex AI, and Managed Agents. Zapier's Daniel Shepard told TechCrunch that **"earlier Sonnet versions would stall on multi-step tasks — Sonnet 5 finishes them end-to-end."**

**Safety**: lower misalignment than Sonnet 4.6, cyber safeguards on by default, and a **0.0% exploit-creation rate** on Firefox vulnerability tests.

**Strategic context**: agentic capability is now "table stakes" across foundation-model companies; competition has shifted to **cost-efficiency, reliability, and autonomous-task completion**. Heading into its IPO, Anthropic is breaking the boundary between its Opus and Sonnet tiers to **win the cost / performance contest in high-volume production workloads**.]]></description>
      <pubDate>Wed, 01 Jul 2026 00:00:00 GMT</pubDate>
      <category>AI</category>
    </item>
    <item>
      <title><![CDATA[[PR] Horiemon AI School Individual Plan / 1-Day Camp at ¥20,000 OFF via Oflight's Affiliate Link
Automatic discount (no code entry required; referral code: 5DX4GA)]]></title>
      <link>https://www.oflight.co.jp/en/columns/horiemon-ai-school-individual-camp-2026-06</link>
      <guid isPermaLink="true">https://www.oflight.co.jp/en/columns/horiemon-ai-school-individual-camp-2026-06</guid>
      <description><![CDATA[**[ADVERTISEMENT / PR]** This article introduces **Horiemon AI School Inc.** When you sign up via **Oflight's special affiliate link**, the **Horiemon AI School Individual Plan** (standard first payment ¥179,080 tax-inc) and the **1-Day Camp** (standard ¥220,000 tax-inc) get an **automatic ¥20,000 discount** → Individual Plan first payment becomes **¥157,080** and the 1-Day Camp becomes **¥198,000** (no code entry required).

**About the service**: An AI training and consulting business positioned as a **"corporate AI adoption partner."** Reported figures: **200+ adopting companies**, **2,000+ learners**, **75%+ AI-deployment cost reduction** (per the [official LP](https://horiemon.ai/)). The brand is overseen by entrepreneur Takafumi Horie. Both online sessions and in-person sessions in Ichigaya, Tokyo are offered.

**Plans**:
1. **Individual Plan** — a monthly subscription AI skill-acquisition program (monthly online training, content access, community participation)
2. **1-Day Camp** — **two 9am–6pm sessions** online (nationwide) or in person (Ichigaya, Tokyo) at ¥220,000 including tax. A distinctive **step-up system**: **Regular → OB Session → Assistant → Instructor → FC (franchise) membership**. FC members receive a subsidy of **¥16,750 per month**.

**Reader benefit**: applying through the links below gives you the **¥20,000 OFF automatically** (no code entry required on your end). If entering manually, the referral code is **5DX4GA**.

**Individual Plan signup**: [https://tinyurl.com/2yk5m5ac](https://tinyurl.com/2yk5m5ac)
**1-Day Camp signup**: [https://tinyurl.com/27rv7hf3](https://tinyurl.com/27rv7hf3)

**Conflict-of-interest disclosure**: when a signup occurs via a link in this article, Oflight Inc. receives an **affiliate fee of ¥30,000** under the [affiliate terms](https://p.horiemon.ai/aff/terms.html). The ¥20,000 reader discount is provided separately by Horiemon AI School and is independent of Oflight's affiliate fee.]]></description>
      <pubDate>Tue, 30 Jun 2026 00:00:00 GMT</pubDate>
      <category>AI</category>
    </item>
    <item>
      <title><![CDATA[What Is Open GENAI? Japan's Digital Agency Open-Sources Its Government AI Platform
hirokawaguchi/open-genai Brings Full Local Deployment (Keycloak / Ollama / Qdrant / Stable Diffusion / faster-whisper)]]></title>
      <link>https://www.oflight.co.jp/en/columns/open-genai-digital-agency-government-ai-oss-2026-06</link>
      <guid isPermaLink="true">https://www.oflight.co.jp/en/columns/open-genai-digital-agency-government-ai-oss-2026-06</guid>
      <description><![CDATA[**Open GENAI** (the OSS release of Japan's government generative-AI platform **源内 / GENAI**) was published by Japan's Digital Agency on **April 24, 2026 on GitHub under MIT license** ([official release](https://www.digital.go.jp/en/news/907c8e5d-2f4f-4bd7-9400-37c9f4221d7d) / [GENAI Web repo](https://github.com/digital-go-jp/genai-web) / [GENAI AI apps repo](https://github.com/digital-go-jp/genai-ai-api)).

**What's released**:
- **GENAI Web**: an AI interface built on TypeScript / React 19 / Zustand 5 / React Router 7 / AWS CDK / Tailwind CSS, with the **Digital Agency design system**
- **GENAI AI Apps**: three development templates — **AWS for administrative RAG / Azure for self-hosted LLM / Google Cloud for a legal-system AI** referencing current statutes

**License**: **MIT + CC BY 4.0** (commercial use, modification, and redistribution allowed)

**Scale**: the foundation of a 2026-fiscal-year **pilot covering ~180,000 government employees across all ministries**, with planned expansion to local governments and private-sector adopters.

**Design thesis**: **REST API-first + an ExApp (external-app integration) microservices model**, unifying AWS / Azure / Google Cloud behind a single interface. The goal is to **structurally eliminate vendor lock-in and duplicate development across agencies**.

**vs GenU (AWS Generative AI Use Cases)**: where GenU leans into AWS managed services (Bedrock Agents / Knowledge Base / MCP), GENAI is **REST-API + ExApp-extensible**, adding **enterprise-grade governance** — team management with RBAC (System Admin / Team Admin / User), SAML multi-IdP, KMS CMEK, TTL data-retention policies, multi-layer WAF, Bedrock Inference Profiles. The trade-off: features like video generation, web extraction, and prompt optimization are **not built in** but delegated to external ExApps.

**Validated models**: Claude Sonnet 4.6, Amazon Nova Lite. **[PLaMo 3.0 Prime](../columns/plamo-3-0-prime-pfn-japanese-llm-2026-06)** has also been selected as a trial model.

**Caveat**: the Digital Agency explicitly states that **"permanent maintenance is not guaranteed and the OSS publication may be terminated in the future."** Long-term operations are the adopting organization's responsibility.

**Fully-local community fork**: **[hirokawaguchi/open-genai](https://github.com/hirokawaguchi/open-genai)** (unofficial, experimental, MIT) swaps Cognito → **Keycloak (SAML)**, Bedrock → **OpenAI-compatible APIs (Ollama / vLLM / LM Studio)**, OpenSearch → **Qdrant**, DynamoDB → **SQLite**, Transcribe → **faster-whisper**, Bedrock image → **Stable Diffusion** — running the full GENAI stack on **a single Docker Compose command** with zero cloud dependency. Recommended Japanese model: **Qwen2.5**. Supports macOS Apple Silicon (Metal), Linux + NVIDIA (CUDA), and CPU-only.

**Oflight's view**: combined with the trends covered in our [Local LLM June 2026 Update](../columns/local-llm-landscape-2026-june-update), Open GENAI is **the frontrunner generative-AI platform for Japanese municipalities and public-sector adopters** — now via two distinct paths (the official cloud-deployed release plus the hirokawaguchi/open-genai local-deployment fork). The column closes with three direct inquiry funnels for Open GENAI evaluation, custom implementation, and ongoing maintenance.]]></description>
      <pubDate>Mon, 29 Jun 2026 00:00:00 GMT</pubDate>
      <category>AI</category>
    </item>
    <item>
      <title><![CDATA[What Is Apple Container? Apple's Official Swift OSS for Running Linux Containers on macOS 26
A Docker Desktop Alternative — Apache 2.0, 44.5k Stars, v1.0.0 (June 9, 2026)]]></title>
      <link>https://www.oflight.co.jp/en/columns/apple-container-macos-linux-runtime-2026-06</link>
      <guid isPermaLink="true">https://www.oflight.co.jp/en/columns/apple-container-macos-linux-runtime-2026-06</guid>
      <description><![CDATA[**Apple Container** is **Apple's official Swift OSS for running Linux containers on macOS**, announced at WWDC 2025 ([GitHub: apple/container](https://github.com/apple/container) / [apple/containerization](https://github.com/apple/containerization) / [Apple Open Source](https://opensource.apple.com/projects/container/) / [WWDC25 session](https://developer.apple.com/videos/play/wwdc2025/346/)).

**v1.0.0 shipped June 9, 2026** under Apache 2.0, **44.5k GitHub stars and 1.3k forks** at writing, 98% Swift, **Apple Silicon only**.

**The defining design choice is its "one VM per container" architecture** — unlike Docker Desktop's shared-kernel VM, each container runs in its own lightweight VM for stronger security and resource isolation. Sub-second boot times, minimal root filesystem, default 1 GiB RAM and 4 CPUs per container, and **near-zero idle footprint when nothing is running**.

**Tech stack**: macOS 26's Virtualization.framework + vmnet framework + XPC + launchd + Keychain. The control plane is container-apiserver / container-core-images / container-network-vmnet / container-runtime-linux. **OCI-compatible** with Docker Hub / GHCR; build with the BuildKit-based `container builder`. Cross-arch (arm64 / amd64), with x86 running under Rosetta.

**Where it fits vs Docker Desktop**: Apple Container is strongest at **single-container runs, native isolation, and minimal idle cost**; Docker Desktop still wins on **Compose, ecosystem maturity, and multi-platform support**. **Docker Compose is not supported at v1.0.0**, memory ballooning is partial (released pages may not return to the host — heavy loads may require restarts), and these limits are explicit in the docs.

**Requirements**: **Mac with Apple Silicon + macOS 26** (macOS 15 works with networking constraints; Intel Macs are fully unsupported).

**Use cases**: local backend services, CI-style builds, cross-architecture image generation, data analysis via host-folder mounting, and untrusted-code isolation. It's also **an excellent companion for running [local LLMs](../columns/local-llm-landscape-2026-june-update) on M5 Macs** — Ollama / vLLM containers paired with Apple Container is a natural fit. The column closes with three inquiry funnels for Mac developer environment setup, container migration, and ongoing maintenance.]]></description>
      <pubDate>Mon, 29 Jun 2026 00:00:00 GMT</pubDate>
      <category>Software Development</category>
    </item>
    <item>
      <title><![CDATA[What Is Grok Build? xAI's Official CLI Coding Agent]]></title>
      <link>https://www.oflight.co.jp/en/columns/grok-build-xai-cli-coding-agent-2026-06</link>
      <guid isPermaLink="true">https://www.oflight.co.jp/en/columns/grok-build-xai-cli-coding-agent-2026-06</guid>
      <description><![CDATA[**[Grok Build](https://x.ai/cli)** is xAI's official CLI coding agent, **released as early beta on May 14, 2026** ([official](https://x.ai/cli) / [Changelog](https://x.ai/build/changelog) / [npaka's Japanese note write-up](https://note.com/npaka/n/nd813d073196e)).

Alongside Claude Code, Codex CLI, Gemini CLI, and [agmsg](../columns/agmsg-cross-agent-messaging-cli-ai-2026-06), Grok Build is **xAI's entry in the CLI coding-agent space**. Its key differentiators are **up to 8 parallel sub-agents isolated in separate Git worktrees**, a Plan-Review-Approve workflow, **ACP (Agent Client Protocol) + MCP servers**, and **local execution** (air-gap-compatible).

**Model & pricing**:
- Internal model: **grok-build-0.1** (256K context, API rates **$1.00/M input, $2.00/M output**); **Composer 2.5** for SuperGrok / X Premium+
- SWE-Bench Verified **70.8%** (vendor-reported; ~17 pts behind Claude Opus 4.7 87.6% and GPT-5.5 88.7%)
- Access gated to **SuperGrok Heavy ($300/mo; $99/mo introductory promo)**, SuperGrok, and X Premium+

**Install**: `curl -fsSL https://x.ai/cli/install.sh | bash`.

**Integrations**: AGENTS.md / plugins / hooks / Skills / **Plugin Marketplace (MongoDB / Vercel / Sentry / Chrome DevTools / Cloudflare / Superpowers)** / Agent Dashboard / `/goal` long-running autonomous mode / headless `-P` flag for CI/CD.

**Strategic context**: the February 2026 **SpaceX × xAI merger**, SpaceX's disclosed **$60B acquisition option on Cursor**, and Composer 2.5 currently training on **Colossus**, xAI's in-house supercomputer — Grok Build sits inside an unusually vertically-integrated stack.

**Caveats**: roughly **15× the price** of Claude Code / Codex CLI bundles ($20/mo); ~17 pts behind on SWE-Bench Verified; no IDE integration (terminal only); flagship demos such as Arena Mode remain unshipped. Where it wins: **Git-worktree parallelism, xAI ecosystem integration, and local execution**.]]></description>
      <pubDate>Sat, 27 Jun 2026 00:00:00 GMT</pubDate>
      <category>AI</category>
    </item>
    <item>
      <title><![CDATA[Ornith-1.0 Deep Dive — DeepReinforce's June 26, 2026 MIT Open-Weights Family Specialized for Agentic Coding
Three Sizes (9B Dense / 35B MoE / 397B MoE), All at 262K Context, Built on Qwen 3.5 + Gemma 4, Shipping in BF16 + FP8 + GGUF
SWE-Bench Verified 82.4% (397B) / 75.6% (35B) / 69.4% (9B), SWE-Bench Pro 62.2%, Vendor-Reported SOTA Among Open Weights at Each Size Tier
Reinforcement Learning Optimizes Both Solution Rollouts AND the Scaffolding That Drives Them — A 'Self-Improving' Design
Compatible With OpenHands / Hermes Agent / OpenClaw, ClawEval Benchmark Published — Directly Relevant to Oflight's OpenClaw Service Users]]></title>
      <link>https://www.oflight.co.jp/en/columns/ornith-1-0-deepreinforce-agentic-coding-2026-06</link>
      <guid isPermaLink="true">https://www.oflight.co.jp/en/columns/ornith-1-0-deepreinforce-agentic-coding-2026-06</guid>
      <description><![CDATA[**DeepReinforce released Ornith-1.0 on June 26, 2026** ([official](https://deep-reinforce.com/ornith_1_0.html) / [Hugging Face collection](https://huggingface.co/collections/deepreinforce-ai/ornith-10)). It is an **MIT-licensed open-weights family specialized for agentic coding**, **with no regional restrictions**.

**Three sizes**: [Ornith-1.0-9B](https://huggingface.co/deepreinforce-ai/Ornith-1.0-9B) (dense, ~19GB BF16) / [Ornith-1.0-35B](https://huggingface.co/deepreinforce-ai/Ornith-1.0-35B) (MoE) / [Ornith-1.0-397B](https://huggingface.co/deepreinforce-ai/Ornith-1.0-397B) (MoE, built on Qwen 3.5 + Gemma 4). **All sizes ship 262K context**, with **FP8 and GGUF quantizations released alongside**.

**Benchmarks (vendor-reported, claimed SOTA at each open-weights size tier)**:

| Benchmark | 9B | 35B | 397B |
|---|---|---|---|
| **SWE-Bench Verified** | **69.4%** | **75.6%** | **82.4%** |
| **SWE-Bench Pro** | **42.9%** | **50.4%** | **62.2%** |
| **SWE-Bench Multilingual** | — | — | **78.9%** |
| **Terminal-Bench 2.1** | 43.1% | 64.2% | **77.5-78.2%** |
| **NL2Repo** | 27.2% | 34.6% | **48.2%** |
| **ClawEval** | — | — | **77.1%** |

**Design thesis**: Reinforcement learning optimizes **both the solution rollouts and the scaffolding (the agent structure that drives them) itself** — a 'self-improving' agentic-coding design. It sits naturally next to the [Loop Engineering Maker-Checker](../columns/loop-engineering-ai-agent-paradigm-2026-06) paradigm. Reasoning is exposed via `<think>...</think>` blocks; function calling and tool use are first-class.

**Distribution and ops**: vLLM ≥ 0.19.1 / SGLang ≥ 0.5.9 / Transformers ≥ 5.8.1 / Docker + llama.cpp / Ollama. OpenAI-compatible endpoints. The 9B fits on a single 80GB GPU; 35B and 397B want an **8×80GB GPU node (TP=8)**. Agent-framework compatibility: **OpenHands, Hermes Agent, and [OpenClaw](../services/openclaw-setup)** (Oflight's own service line — and ClawEval is in DeepReinforce's published benchmark set).

**DeepReinforce lineage**: an RL-focused research organization that has previously shipped [CUDA-L1 (avg 3.12× GPU speedup)](https://github.com/deepreinforce-ai/CUDA-L1), [CUDA-L2 (HGEMM kernels beating cuBLAS)](https://github.com/deepreinforce-ai/CUDA-L2), and **IterX (MLSys 2026 NVIDIA Track)**. Ornith-1.0 applies the same RL playbook to LLM self-improvement.

**Positioning**: alongside [Kimi K2.7-Code](../columns/kimi-k2-7-code-moonshot-ai-2026-06) (1T MoE / 32B active) and [GLM-5.2](../columns/local-llm-landscape-2026-june-update) (Intelligence Index v4.1 = 51, open-weights leader), **Ornith-1.0 is at the front of the June-2026 agentic-coding open-weights race**. Against Chinese-origin models (Kimi / GLM), its differentiator is **MIT license + no regional restrictions + a US-flag procurement story**.

**Caveat**: benchmarks are DeepReinforce's own vendor-reported numbers. Independent third-party verification on public leaderboards has not yet appeared (as of June 26, 2026).

The article closes with **three inquiry funnels for Ornith-1.0–era local-LLM evaluation, build, and ongoing maintenance**.]]></description>
      <pubDate>Fri, 26 Jun 2026 00:00:00 GMT</pubDate>
      <category>AI</category>
    </item>
    <item>
      <title><![CDATA[What Is agmsg? Cross-Vendor Messaging for CLI AI Coding Agents]]></title>
      <link>https://www.oflight.co.jp/en/columns/agmsg-cross-agent-messaging-cli-ai-2026-06</link>
      <guid isPermaLink="true">https://www.oflight.co.jp/en/columns/agmsg-cross-agent-messaging-cli-ai-2026-06</guid>
      <description><![CDATA[**[agmsg](https://github.com/fujibee/agmsg)** is an open-source (MIT) **cross-vendor messaging tool for CLI AI coding agents** by **fujibee** ([official site agmsg.cc](https://agmsg.cc/)).

It lets **Claude Code, Codex, Gemini CLI, GitHub Copilot CLI, Antigravity, and OpenCode** talk to each other through a shared local SQLite file — so **humans stop being the copy-paste courier between tools**. Tagline: "You stop being the copy-paste courier between your agents."

**Highlights**:
- **Only dependencies are bash and sqlite3** — no daemon, no network, no Python
- **Three delivery modes** — `monitor` (~5s real-time push), `turn` (between-turn polling), or `both`
- **N-agent teams**, role switching (`actas`), spawning new agents (`spawn`), and clean teardown (`despawn`)
- **Not MCP, not subagents, not a message queue** — a peer-to-peer messaging layer between sessions
- **One-line install**: `npx agmsg`
- **Claude Code Plugin Marketplace**: `/plugin install agmsg@fujibee-agmsg`

**Product Hunt #5 Product of the Day on June 9, 2026** (219 upvotes, 39 comments). GitHub stars 859, v1.1.1 (June 25, 2026). Community-built derivatives include agmsg-shogi, agmsg-go, and agmsg-mcp.

**Oflight's take**: unlike [Loop Engineering](../columns/loop-engineering-ai-agent-paradigm-2026-06) or [Sakana Fugu's orchestration model](../columns/sakana-fugu-orchestration-model-2026-06), agmsg occupies a different niche — **peer-to-peer messaging at the same layer, across tools**. It's an especially natural fit for the [Claude Code Agent View parallel-orchestration](../columns/claude-code-agent-view-parallel-orchestration-2026) workflows, and the most pragmatic way to stitch multi-vendor LLMs into one dev workflow. The article closes with **three direct inquiry funnels** for AI agent environment setup and custom integration.]]></description>
      <pubDate>Fri, 26 Jun 2026 00:00:00 GMT</pubDate>
      <category>AI</category>
    </item>
    <item>
      <title><![CDATA[Local LLM June 2026 Update — Two Months After Our April Landscape
GLM-5.2 Leads Open Weights at Intelligence Index v4.1 51, MiniMax M3 Ships 1M Context + SWE-Bench Pro 59%, NVIDIA Nemotron 3 Ultra 550B
Blackwell Native MXFP4 Pushes RTX 5090 Into the 30-70B Practical Zone
Japan's SI Market Matures (Intec ¥5M+, Ricoh On-Prem Starter Kit Won the Nikkei Grand Prize, PFN PLaMo Selected for the Digital Agency 'Gennai' Platform)
EU AI Act GPAI Enforcement Starts August 2, 2026]]></title>
      <link>https://www.oflight.co.jp/en/columns/local-llm-landscape-2026-june-update</link>
      <guid isPermaLink="true">https://www.oflight.co.jp/en/columns/local-llm-landscape-2026-june-update</guid>
      <description><![CDATA[Two months after our [April 2026 local-LLM landscape column](../columns/local-llm-landscape-2026-april-comprehensive-comparison), here is the primary-source update on what has changed.

**Three big shifts**:

**(1) Open-weights have closed the gap with closed-source.** [GLM-5.2](https://simonwillison.net/2026/Jun/17/glm-52/) (Z.ai, MIT, June 16, 2026) tops the Intelligence Index v4.1 at **51** (MiniMax M3 44 / DeepSeek V4 Pro 44 / Kimi K2.6 43). [MiniMax M3](https://kilo.ai/open-source-models) ships **1M context + native multimodality + SWE-Bench Pro 59.0% + Terminal-Bench 2.1 66.0% + MCP Atlas 74.2%**. [NVIDIA Nemotron 3 Ultra](https://research.nvidia.com/labs/nemotron/Nemotron-3/) (revealed by Jensen Huang at Computex 2026) is a **550B-parameter** US-flag open-weight leader. [VibeThinker-3B](https://arxiv.org/pdf/2606.16140) (WeiboAI, MIT, Qwen2.5-Coder-3B fine-tune) reaches **frontier-reasoner parity at 3B**.

**(2) Blackwell makes 30–70B models practical on consumer GPUs.** The RTX 5090 has **32GB GDDR7 and 1,792 GB/s bandwidth** (+77% vs 4090) with **native MXFP4 — GGUF Q4 runs with zero emulation overhead**, hitting **5,841 tok/s** on Qwen 2.5-Coder-7B at batch 8 (2.6× A100 80GB). The RTX PRO 6000 Blackwell reaches **~8,425 tok/s** on 30B; the B200 ships **192GB HBM3e at 8 TB/s** (4–5× H100).

**(3) Japan's SI market is maturing.** **Intec** (TIS group) launched local-LLM deployment SI on January 29, 2026 — **minimum 1 month, from ¥5,000,000+ ex tax** — targeting manufacturing and finance. **Ricoh's 'RICOH On-Prem LLM Starter Kit'** won the **2025 Nikkei Excellent Product/Service Award grand prize** (Qwen2.5-VL-32B-Instruct base). PFN's [PLaMo 3.0 Prime](../columns/plamo-3-0-prime-pfn-japanese-llm-2026-06) was selected for the Japanese **Digital Agency 'Gennai'** common generative-AI platform — alongside the Mizuho / Lion Qwen on-domestic-infrastructure precedent.

The column also covers concurrent moves on [Kimi K2.7-Code](../columns/kimi-k2-7-code-moonshot-ai-2026-06), [Sakana Fugu](../columns/sakana-fugu-orchestration-model-2026-06), [DiffusionGemma](../columns/diffusiongemma-google-text-diffusion-2026-06), and [Liquid AI LFM2.5-J](../columns/liquid-ai-lfm25-japanese-models-2026-06).

Inference-engine selection (**AWQ + vLLM for GPU, GGUF + llama.cpp for CPU/edge, SGLang for agents, TensorRT-LLM for NVIDIA clusters**), quantization (BitNet 1.58-bit / MXFP4 / AWQ), regulation (**EU AI Act GPAI enforcement from August 2, 2026; systemic-risk threshold of 10^25 FLOPs**, US [Fable 5 export-control precedent](../columns/claude-fable-5-export-control-suspension-2026-06), Chinese-model cross-border data), typical GPU configurations by workload, and a three-step Oflight-recommended adoption path are all covered.

The article closes with **three direct inquiry funnels** for local-LLM evaluation, build, and ongoing maintenance.]]></description>
      <pubDate>Tue, 23 Jun 2026 00:00:00 GMT</pubDate>
      <category>AI</category>
    </item>
    <item>
      <title><![CDATA[Loop Engineering Deep Dive — The June 2026 Successor to Prompt / Context / Harness Engineering, Crystallized by Anthropic's Boris Cherny ('I don't prompt Claude anymore — I write loops'), Named and Codified by Addy Osmani, with Six Building Blocks (Automations, Worktrees, Skills, Plugins, Maker-Checker Sub-agents, Durable State) Mapped Onto Claude Code's Existing Feature Set]]></title>
      <link>https://www.oflight.co.jp/en/columns/loop-engineering-ai-agent-paradigm-2026-06</link>
      <guid isPermaLink="true">https://www.oflight.co.jp/en/columns/loop-engineering-ai-agent-paradigm-2026-06</guid>
      <description><![CDATA[A primary-source deep dive on **Loop Engineering**, the June 2026 AI-engineering trend named and codified by Google Chrome DevRel lead **Addy Osmani** in his ["Loop Engineering" blog post](https://addyosmani.com/blog/loop-engineering/) and elevated to industry attention by Anthropic Claude Code lead **Boris Cherny's** quote — **"I don it prompt Claude anymore. I have loops running that prompt Claude and figuring out what to do. My job is to write loops."** ([reported by The New Stack](https://thenewstack.io/loop-engineering/)). Covers the four-generation lineage: Prompt Engineering (2022-2024) → Context Engineering (2025, coined by Shopify CEO Tobi Lütke, formalized in [Anthropic's Effective Context Engineering for AI Agents](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents)) → Harness Engineering (early 2026) → **Loop Engineering (June 2026 onwards)**. Grounded in Peter Steinberger's seed phrase — **"you should be designing loops that prompt your agents"** — the column maps out the six building blocks: (1) Automations / Trigger (timer- or event-driven heartbeats), (2) Worktrees (isolated git checkouts to prevent parallel sub-agent collisions), (3) Skills (SKILL.md / CLAUDE.md to externalize intent and reduce "intent debt"), (4) Plugins / Connectors via MCP (execution permissions), (5) Maker / Checker Sub-agents (separating generation from verification), and (6) Durable State (memory belongs on disk, not in context). Explains Inner Loop vs Outer Loop, how Claude Code's `/goal`, Automations, Worktrees, Skills, and Sub-agents constitute a ready-made Loop Engineering toolkit, the surge of Japanese coverage on Qiita / Zenn / DevelopersIO / note / OptiMax, and the five major risk vectors: Cognitive Surrender (Osmani's central warning), Loop Brittleness, Verifier mis-grading, HITL approval fatigue, and runaway-loop cost explosion.]]></description>
      <pubDate>Mon, 22 Jun 2026 00:00:00 GMT</pubDate>
      <category>AI</category>
    </item>
    <item>
      <title><![CDATA[PLaMo 3.0 Prime Deep Dive — Preferred Networks' Flagship Japanese LLM Officially Released June 22, 2026, Expanded from 64K to 256K Context, Dual Reasoning / Non-reasoning Variants, ¥60 / ¥250 per 1M Tokens, Selected for the Digital Agency's Common 'Gennai' Generative-AI Platform, Built From-Scratch on NICT Collaboration and METI GENIAC Phase 3 Outputs]]></title>
      <link>https://www.oflight.co.jp/en/columns/plamo-3-0-prime-pfn-japanese-llm-2026-06</link>
      <guid isPermaLink="true">https://www.oflight.co.jp/en/columns/plamo-3-0-prime-pfn-japanese-llm-2026-06</guid>
      <description><![CDATA[**Preferred Networks (PFN) officially released PLaMo 3.0 Prime on June 22, 2026** ([official press pr20260622](https://www.preferred.jp/ja/news/pr20260622) / [tech blog](https://tech.preferred.jp/ja/blog/plamo-3-0-prime-release/)). Successor to PLaMo 2.0 Prime (2025 Nikkei Excellent Product Award grand prize), this is the production rollout after a 3-month monitor program following the March 19, 2026 beta. **Context extended from 64K beta → 256K production**, **dual Reasoning / Non-reasoning variants**, proprietary tokenizer optimized for Japanese token efficiency, post-training with SFT + DPO + RL. **Compared against** gpt-oss-120b / Qwen3.6-27B (open) and GPT-5.4 mini / Claude Haiku 4.5 (closed in the same price tier). **Evaluated on 15 benchmarks**: JFBench / IFBench / Japanese MT-Bench / lawqa_jp / MedRECT / Japanese Medical Licensing Exam / MT-Bench / AIME 2024 / GPQA-Diamond / BFCL / LongBench v2 / HELM Safety. PFN CEO/CTO Daisuke Okanohara claims parity or superiority over same-tier models on Japanese instruction-following, coding, and tool-use, while ITmedia at the beta stage noted weakness in math and multi-tool selection. **Pricing is aggressive**: Standard plan **¥60 input / ¥250 output per 1M tokens** (up to 128K), Free plan pending, Provider plan custom-quoted. **Distribution**: PLaMo API (SaaS), on-premise, Amazon Bedrock Marketplace, Snowflake. **Prime itself is closed-weights**, but NICT-co-developed base models `plamo-3-nict-2b/8b/31b-base` are open on Hugging Face. **Adoption**: standard model in miibo / Tachyon / QommonsAI, and **selected as a trial model for the Japanese Digital Agency's common generative-AI platform 'Gennai'**. **Not disclosed**: parameter count, dense vs MoE, and independent third-party benchmark verification — pending Nejumi LLM Leaderboard registration.]]></description>
      <pubDate>Mon, 22 Jun 2026 00:00:00 GMT</pubDate>
      <category>AI</category>
    </item>
    <item>
      <title><![CDATA[Sakana Fugu Deep Dive — The June 22, 2026 'LLM Trained to Call Other LLMs' from Sakana AI: Dynamic Orchestration Across GPT-5.5 / Claude Opus 4.8 / Gemini 3.1 Pro, Powered by the ICLR 2026 TRINITY / Conductor Papers, Claiming 73.7 on SWE-Bench Pro (Beating Opus 4.8), Shipping as Fugu / Fugu Ultra with $20 / $100 / $200 Subscription Tiers — EU/EEA Excluded Pending GDPR Compliance]]></title>
      <link>https://www.oflight.co.jp/en/columns/sakana-fugu-orchestration-model-2026-06</link>
      <guid isPermaLink="true">https://www.oflight.co.jp/en/columns/sakana-fugu-orchestration-model-2026-06</guid>
      <description><![CDATA[**Sakana AI officially launched Sakana Fugu on June 22, 2026** ([fugu-release](https://sakana.ai/fugu-release/) / [product page](https://sakana.ai/fugu/) / [gihyo.jp](https://gihyo.jp/article/2026/06/sakana-fugu) / [GIGAZINE](https://gigazine.net/gsc_news/en/20260622-sakana-fugu-multi-agent-system-ai)). Critically, this is **not a next-generation Japanese LLM — it is an LLM trained to call other LLMs**, a 'conductor' model that dynamically orchestrates frontier models inside the loop. When you send a query, Fugu itself either (1) answers directly when it can, or (2) for complex multi-step tasks **selects, dispatches, verifies, and integrates** from an agent pool that includes **GPT-5.5, Claude Opus 4.8, Gemini 3.1 Pro** and others. **Academic basis**: two ICLR 2026 papers — **TRINITY** (an evolutionarily optimized LLM coordinator that dynamically assigns Thinker / Worker / Verifier roles) and **Conductor** (RL-discovered coordination strategies expressed in natural language). **Two variants**: **Fugu** (everyday tasks, low latency) and **Fugu Ultra** (hardest problems, deep coordination — pool composition is fixed and cannot be excluded). **Benchmarks**: **SWE-Bench Pro 73.7** (reported to beat Claude Opus 4.8, per XenoSpectrum), Terminal-Bench 2.1 above Anthropic's latest, Charxiv Reasoning above Claude Mythos Preview — but **lags on Humanity's Last Exam (HLE)**. Sakana's own framing is conservative: "shoulder-to-shoulder with Fable 5 and Mythos Preview," not blanket dominance. **Pricing**: Fugu Ultra at **$5/M input ($10/M >272K) and $30/M output ($45/M >272K)**, plus **subscriptions at Standard $20 / Pro $100 / Max $200 per month** (both Fugu and Fugu Ultra). Enterprise is usage-based. **OpenAI-compatible API** at console.sakana.ai. **Not available in the EU/EEA** pending GDPR compliance; Japan-region usage works. **The strategic point is structural resilience, not raw performance** — escape from single-vendor dependence and diversification against export-control risk (directly continuing our Sakana Marlin column's Fable 5 export-restriction thread). BuildFastWithAI calls it 'the orchestration model that routes around export controls,' and Clanker Cloud frames it as 'Model Orchestration Is Becoming the Product.' **Fugu's own parameter count, Japanese-specific benchmark scores (ELYZA / JMMLU / JMT-Bench), and individual statements from David Ha / Llion Jones are not yet confirmed**, leaving 'thin wrapper over external APIs' criticism and independent verification as open questions.]]></description>
      <pubDate>Mon, 22 Jun 2026 00:00:00 GMT</pubDate>
      <category>AI</category>
    </item>
    <item>
      <title><![CDATA[[Update 2026-06-16: Paused] Anthropic Pauses the June 15 Claude Agent SDK Credit Pool Split — Official Help Center Notice Reverts Behavior to Subscription Usage Limits, Previously Announced $20 / $100 / $200 Monthly Credits No Longer Available]]></title>
      <link>https://www.oflight.co.jp/en/columns/claude-agent-sdk-credit-billing-change-2026-06-15</link>
      <guid isPermaLink="true">https://www.oflight.co.jp/en/columns/claude-agent-sdk-credit-billing-change-2026-06-15</guid>
      <description><![CDATA[**June 16, 2026 Update**: On the very day of enforcement (June 15, 2026), Anthropic **paused the planned split of Claude Agent SDK, `claude -p`, GitHub Actions, and third-party app (OpenClaw, Zed, Conductor, etc.) usage from subscription rate limits**. The [official Help Center article](https://support.claude.com/en/articles/15036540-use-the-claude-agent-sdk-with-your-claude-plan) was amended with: "Update June 15: We are pausing the changes to Claude Agent SDK usage described below. For now, nothing has changed: Claude Agent SDK, `claude -p`, and third-party app usage still draw from your subscription is usage limits. The previously announced monthly credit, which would have been available to eligible claimants in connection with these changes, isn it available. We are working to update the plan to better support how users build with Claude subscriptions. When we have an update, we will share it before anything takes effect." **The previously announced monthly credits (Pro $20 / Max 5x $100 / Max 20x $200 / Team $20-100 / Enterprise $200) were not distributed**. Programmatic usage now once again draws from standard subscription limits. The change is officially a **pause, not a full rollback** — Anthropic says it is reworking the plan and will share details before anything new ships. The backlash that triggered this was substantial: [community estimates](https://gist.github.com/MagnaCapax/d9177e35b355853f03c730dfcaa693ef) projected effective price hikes of 12-175x against API-rate equivalents, Anthropic engineer Lydia Hallie was quickly Community-Noted on X, and Reddit r/ClaudeAI, HN, and [The New Stack](https://thenewstack.io/anthropic-agent-sdk-credits/) all carried critical coverage. This is Anthropic is **third subscription-policy reversal of 2026** (January OAuth block reversed within days, April 4 third-party agent ban reversed within 24 hours, and now the May 14 compromise credit pool paused on its June 15 enforcement day). This column preserves the original announced design while adding a detailed reversal section: timeline, operational implications, and the current validity of the "turn Extra Usage auto-billing off" guidance.]]></description>
      <pubDate>Tue, 16 Jun 2026 00:00:00 GMT</pubDate>
      <category>AI</category>
    </item>
    <item>
      <title><![CDATA[Sakana AI Marlin Deep Dive — Japan's 'Virtual CSO' Ultra Deep Research Agent Explained]]></title>
      <link>https://www.oflight.co.jp/en/columns/sakana-marlin-ultra-deep-research-agent-2026</link>
      <guid isPermaLink="true">https://www.oflight.co.jp/en/columns/sakana-marlin-ultra-deep-research-agent-2026</guid>
      <description><![CDATA[Sakana AI's first commercial product 'Marlin,' launched June 15, 2026, is an autonomous research agent — not an LLM. Combining AB-MCTS (Adaptive Branching Monte Carlo Tree Search) with multi-LLM collaboration across OpenAI o4-mini, Google Gemini 2.5 Pro, and DeepSeek R1-0528, Marlin operates autonomously for up to ~8 hours per task to generate tens-to-100+ page reports and executive slides. Designed for financial institutions, corporate planning, consulting, and think tanks, it differs fundamentally from OpenAI Deep Research and Gemini Deep Research in both purpose and architecture. This guide covers everything from its technical design to pricing, competitor comparison, and what it means for Japanese enterprises.]]></description>
      <pubDate>Mon, 15 Jun 2026 00:00:00 GMT</pubDate>
      <category>AI</category>
    </item>
    <item>
      <title><![CDATA[Claude Fable 5 and Mythos 5 Suspended Under US Export Control Directive — Forced Recall Just 3 Days After Launch]]></title>
      <link>https://www.oflight.co.jp/en/columns/claude-fable-5-export-control-suspension-2026-06</link>
      <guid isPermaLink="true">https://www.oflight.co.jp/en/columns/claude-fable-5-export-control-suspension-2026-06</guid>
      <description><![CDATA[On June 12, 2026 at 17:21 ET, Anthropic received an export control directive from the US Department of Commerce Bureau of Industry and Security (BIS) and immediately suspended Claude Fable 5 and Mythos 5 for all customers. Issued just three days after the models' release, this marks what multiple outlets describe as the first publicly known instance of direct US federal government intervention in a commercially deployed frontier AI model. This column covers the legal nature of the directive, the government's rationale and Anthropic's rebuttal, impact scope across API, Bedrock, and Vertex, alternative model options, and practical implications for Japanese enterprises.]]></description>
      <pubDate>Mon, 15 Jun 2026 00:00:00 GMT</pubDate>
      <category>AI</category>
    </item>
    <item>
      <title><![CDATA[Kimi K2.7-Code Deep Dive — Moonshot AI's June 12, 2026 Coding-Specialized 1T MoE Open-Weights Model, Modified MIT License, $0.95/$4.00 per 1M, 256K Context — But Japanese Enterprises Face Two Critical Caveats (Cross-Border Data and Unverified Benchmarks)]]></title>
      <link>https://www.oflight.co.jp/en/columns/kimi-k2-7-code-moonshot-ai-2026-06</link>
      <guid isPermaLink="true">https://www.oflight.co.jp/en/columns/kimi-k2-7-code-moonshot-ai-2026-06</guid>
      <description><![CDATA[A primary-source deep dive on **Kimi K2.7-Code**, released June 12, 2026 by Moonshot AI (Beijing). Grounded in the [Hugging Face model card](https://huggingface.co/moonshotai/Kimi-K2.7-Code), [MarkTechPost](https://www.marktechpost.com/2026/06/12/moonshot-ai-releases-kimi-k2-7-code-a-coding-model-reporting-21-8-on-kimi-code-bench-v2-over-k2-6/), and [VentureBeat's skepticism piece](https://venturebeat.com/technology/kimi-k2-7-code-cuts-thinking-tokens-30-practitioners-say-benchmarks-dont-check-out). Covers the 1T-total / 32B-active MoE architecture (384 experts, 8 routed + 1 shared), 256K context, MoonViT ~400M vision encoder, native INT4, forced-on thinking mode. License is **Modified MIT** (attribution required only above 100M MAU or $20M MRR), API pricing is $0.95 input / $0.19 cache-hit / $4.00 output per 1M tokens — roughly **1/18 of Claude Opus 4.8's output price**. OpenAI + Anthropic-compatible endpoints drop straight into Claude Code / Cursor / Aider / Cline / cmux. Moonshot self-reports **+21.8% vs K2.6 on its own Kimi Code Bench v2 and -30% reasoning tokens**, but **all public benchmarks are Moonshot's own proprietary suites; independent SWE-bench Verified / Pro / FrontierCode scores are not yet available as of June 15, 2026** (VentureBeat). For Japanese enterprises the column flags two critical caveats: **(1) both `api.moonshot.cn` and the Singapore-subsidiary-run `api.moonshot.ai` remain exposed to PRC National Intelligence Law Article 7 compelled disclosure (set against Japan's PPC DeepSeek alert of February 3, 2025 and the Digital Agency notice of February 6, 2025), and (2) the only reliable mitigation is Hugging Face self-hosting (~4-8 H100, ~595GB INT4) following the Mizuho / Lion Qwen-on-domestic-infrastructure precedent**.]]></description>
      <pubDate>Mon, 15 Jun 2026 00:00:00 GMT</pubDate>
      <category>AI</category>
    </item>
    <item>
      <title><![CDATA[DiffusionGemma Deep Dive — Google DeepMind's June 10, 2026 Open-Weight Text-Diffusion LLM, Same Backbone as Gemma 4 26B (A4B MoE), Up to 4× Faster Than AR Counterparts, Apache 2.0, With an Honest "Quality Trails AR" Disclosure]]></title>
      <link>https://www.oflight.co.jp/en/columns/diffusiongemma-google-text-diffusion-2026-06</link>
      <guid isPermaLink="true">https://www.oflight.co.jp/en/columns/diffusiongemma-google-text-diffusion-2026-06</guid>
      <description><![CDATA[A primary-source deep dive on **DiffusionGemma** (`google/diffusiongemma-26B-A4B-it`, 25.2B total / 3.8B active MoE), released June 10, 2026 by Google DeepMind in coordination with NVIDIA. Grounded in the [official Google blog](https://blog.google/innovation-and-ai/technology/developers-tools/diffusion-gemma-faster-text-generation/), [ai.google.dev model card](https://ai.google.dev/gemma/docs/diffusiongemma/model_card), [Hugging Face card](https://huggingface.co/google/diffusiongemma-26B-A4B-it), and [NVIDIA's blog](https://blogs.nvidia.com/blog/rtx-ai-garage-local-gemma-diffusion/). Where autoregressive (AR) models generate one token at a time left-to-right, diffusion language models (DLMs) **denoise a 256-token canvas in parallel into final text**. 15-20 tokens commit per forward pass, up to 48 denoising steps, 1,000+ tok/sec on H100, 700+ on RTX 5090, ~3.5–4× the throughput of the AR Gemma 4 counterpart. Crucially, Google **openly states that quality lags AR**: MMLU Pro 77.6 vs 82.6, GPQA 73.2 vs 82.3, MMMU Pro 54.3 vs 73.8. Apache 2.0, distributed via Hugging Face / Vertex AI / NVIDIA NIM — the first large-scale open-weight diffusion LLM in the industry. The column covers practical implications for Japanese enterprises (on-prem internal agents, code editing, low-latency workflows) and positioning against Mercury (Inception Labs), LLaDA, and Gemini Diffusion.]]></description>
      <pubDate>Thu, 11 Jun 2026 00:00:00 GMT</pubDate>
      <category>AI</category>
    </item>
    <item>
      <title><![CDATA[Cognition AI's FrontierCode Explained: The Next-Gen Coding AI Benchmark That Asks 'Is It Mergeable?']]></title>
      <link>https://www.oflight.co.jp/en/columns/cognition-frontiercode-benchmark-2026</link>
      <guid isPermaLink="true">https://www.oflight.co.jp/en/columns/cognition-frontiercode-benchmark-2026</guid>
      <description><![CDATA[On June 8, 2026, Cognition AI unveiled **FrontierCode** — not a product, but a coding AI evaluation benchmark. It measures not just 'does it pass tests' but 'would an OSS maintainer actually merge this?' across six axes. This article covers its differences from SWE-bench Verified, the three-tier dataset (Diamond/Main/Extended), official results with Claude Opus 4.8 leading at 13.4% on Diamond, and its relevance to Japan's rigorous code-review culture.]]></description>
      <pubDate>Wed, 10 Jun 2026 00:00:00 GMT</pubDate>
      <category>AI</category>
    </item>
    <item>
      <title><![CDATA[Apple AFM Core Advanced Deep Dive — How 20B Sparse MoE Brings Frontier AI to iPhone]]></title>
      <link>https://www.oflight.co.jp/en/columns/apple-afm-core-advanced-wwdc-2026</link>
      <guid isPermaLink="true">https://www.oflight.co.jp/en/columns/apple-afm-core-advanced-wwdc-2026</guid>
      <description><![CDATA[AFM Core Advanced, the flagship of Apple's third-generation Foundation Models announced at WWDC 2026, packs a 20B-parameter Sparse MoE with Apple's proprietary IFP technology — enabling frontier-class on-device inference on iPhone 17 Pro. This deep dive covers architectural innovations, A19 Pro specs, device requirements, and the 'fully Apple designed' controversy around Gemini distillation.]]></description>
      <pubDate>Wed, 10 Jun 2026 00:00:00 GMT</pubDate>
      <category>AI</category>
    </item>
  </channel>
</rss>