株式会社オブライト

Articles tagged "LLM"

1 article

Claude Sonnet 5 Deep Dive — Anthropic's June 30, 2026 Release Hits 92.4% on SWE-Bench Verified (+12pt Over Opus 4.6) 1M-Token Context, 88.3% on OSWorld-Verified (Beats the 72.4% Human Expert Baseline), 96.2% on GPQA Diamond, 84.7% on ARC-AGI-2 Introductory $2 / $10 per M Tokens Through August 31, 2026 → Standard $3 / $15, Now the Default in Claude Free / Pro and Claude Code Pro

**Anthropic released Claude Sonnet 5 on June 30, 2026** ([official release](https://www.anthropic.com/news/claude-sonnet-5) / [System Card](https://www.anthropic.com/claude-sonnet-5-system-card) / [TechCrunch](https://techcrunch.com/2026/06/30/anthropic-launches-claude-sonnet-5-as-a-cheaper-way-to-run-agents/) / [VentureBeat](https://venturebeat.com/technology/anthropic-launches-claude-sonnet-5-at-a-steep-discount-to-its-top-model-as-the-company-races-toward-a-blockbuster-ipo)). **The headline: the mid-tier Sonnet just leapfrogged Opus 4.6 by 12 points** — **92.4% on SWE-Bench Verified** (Opus 4.6 was 80.8%), **88.3% on OSWorld-Verified** (15.9 pts ahead of the 72.4% human-expert baseline), **96.2% on GPQA Diamond** (over [Gemini 3.1 Pro's](../columns/local-llm-landscape-2026-june-update) 94.3%), and **84.7% on ARC-AGI-2** (7.6 pts ahead of Gemini 3.1 Pro's 77.1%). It ships with a **1M-token context window** (matching Opus 4.8) and a 128K max output. **Strategic pricing on the eve of Anthropic's IPO**: **introductory $2 / M input and $10 / M output through August 31, 2026**, after which standard pricing becomes **$3 / $15** (matching [Sonnet 4.6](../columns/claude-agent-sdk-credit-billing-change-2026-06-15)). Note: **a new tokenizer maps the same input to about 1.0–1.35× more tokens**. It undercuts GPT-5.5, Gemini 3.1 Pro, and Anthropic's own Opus 4.8 on price. **Default-model rollout**: now the **default in claude.ai Free and Pro**, **default in Claude Code Pro**, and available via API (`claude-sonnet-5`), AWS Bedrock, Vertex AI, and Managed Agents. Zapier's Daniel Shepard told TechCrunch that **"earlier Sonnet versions would stall on multi-step tasks — Sonnet 5 finishes them end-to-end."** **Safety**: lower misalignment than Sonnet 4.6, cyber safeguards on by default, and a **0.0% exploit-creation rate** on Firefox vulnerability tests. **Strategic context**: agentic capability is now "table stakes" across foundation-model companies; competition has shifted to **cost-efficiency, reliability, and autonomous-task completion**. Heading into its IPO, Anthropic is breaking the boundary between its Opus and Sonnet tiers to **win the cost / performance contest in high-volume production workloads**.

Claude Sonnet 5AnthropicLLM