株式会社オブライト

Articles tagged "Agentic Coding"

1 article

AI2026-06-26
Ornith-1.0 Deep Dive — DeepReinforce's June 26, 2026 MIT Open-Weights Family Specialized for Agentic Coding Three Sizes (9B Dense / 35B MoE / 397B MoE), All at 262K Context, Built on Qwen 3.5 + Gemma 4, Shipping in BF16 + FP8 + GGUF SWE-Bench Verified 82.4% (397B) / 75.6% (35B) / 69.4% (9B), SWE-Bench Pro 62.2%, Vendor-Reported SOTA Among Open Weights at Each Size Tier Reinforcement Learning Optimizes Both Solution Rollouts AND the Scaffolding That Drives Them — A 'Self-Improving' Design Compatible With OpenHands / Hermes Agent / OpenClaw, ClawEval Benchmark Published — Directly Relevant to Oflight's OpenClaw Service Users
**DeepReinforce released Ornith-1.0 on June 26, 2026** ([official](https://deep-reinforce.com/ornith_1_0.html) / [Hugging Face collection](https://huggingface.co/collections/deepreinforce-ai/ornith-10)). It is an **MIT-licensed open-weights family specialized for agentic coding**, **with no regional restrictions**. **Three sizes**: [Ornith-1.0-9B](https://huggingface.co/deepreinforce-ai/Ornith-1.0-9B) (dense, ~19GB BF16) / [Ornith-1.0-35B](https://huggingface.co/deepreinforce-ai/Ornith-1.0-35B) (MoE) / [Ornith-1.0-397B](https://huggingface.co/deepreinforce-ai/Ornith-1.0-397B) (MoE, built on Qwen 3.5 + Gemma 4). **All sizes ship 262K context**, with **FP8 and GGUF quantizations released alongside**. **Benchmarks (vendor-reported, claimed SOTA at each open-weights size tier)**: | Benchmark | 9B | 35B | 397B | |---|---|---|---| | **SWE-Bench Verified** | **69.4%** | **75.6%** | **82.4%** | | **SWE-Bench Pro** | **42.9%** | **50.4%** | **62.2%** | | **SWE-Bench Multilingual** | — | — | **78.9%** | | **Terminal-Bench 2.1** | 43.1% | 64.2% | **77.5-78.2%** | | **NL2Repo** | 27.2% | 34.6% | **48.2%** | | **ClawEval** | — | — | **77.1%** | **Design thesis**: Reinforcement learning optimizes **both the solution rollouts and the scaffolding (the agent structure that drives them) itself** — a 'self-improving' agentic-coding design. It sits naturally next to the [Loop Engineering Maker-Checker](../columns/loop-engineering-ai-agent-paradigm-2026-06) paradigm. Reasoning is exposed via `<think>...</think>` blocks; function calling and tool use are first-class. **Distribution and ops**: vLLM ≥ 0.19.1 / SGLang ≥ 0.5.9 / Transformers ≥ 5.8.1 / Docker + llama.cpp / Ollama. OpenAI-compatible endpoints. The 9B fits on a single 80GB GPU; 35B and 397B want an **8×80GB GPU node (TP=8)**. Agent-framework compatibility: **OpenHands, Hermes Agent, and [OpenClaw](../services/openclaw-setup)** (Oflight's own service line — and ClawEval is in DeepReinforce's published benchmark set). **DeepReinforce lineage**: an RL-focused research organization that has previously shipped [CUDA-L1 (avg 3.12× GPU speedup)](https://github.com/deepreinforce-ai/CUDA-L1), [CUDA-L2 (HGEMM kernels beating cuBLAS)](https://github.com/deepreinforce-ai/CUDA-L2), and **IterX (MLSys 2026 NVIDIA Track)**. Ornith-1.0 applies the same RL playbook to LLM self-improvement. **Positioning**: alongside [Kimi K2.7-Code](../columns/kimi-k2-7-code-moonshot-ai-2026-06) (1T MoE / 32B active) and [GLM-5.2](../columns/local-llm-landscape-2026-june-update) (Intelligence Index v4.1 = 51, open-weights leader), **Ornith-1.0 is at the front of the June-2026 agentic-coding open-weights race**. Against Chinese-origin models (Kimi / GLM), its differentiator is **MIT license + no regional restrictions + a US-flag procurement story**. **Caveat**: benchmarks are DeepReinforce's own vendor-reported numbers. Independent third-party verification on public leaderboards has not yet appeared (as of June 26, 2026). The article closes with **three inquiry funnels for Ornith-1.0–era local-LLM evaluation, build, and ongoing maintenance**.
OrnithDeepReinforceOpen Weight