株式会社オブライト
AI2026-06-22

Loop Engineering Deep Dive — The June 2026 Successor to Prompt / Context / Harness Engineering, Crystallized by Anthropic's Boris Cherny ('I don't prompt Claude anymore — I write loops'), Named and Codified by Addy Osmani, with Six Building Blocks (Automations, Worktrees, Skills, Plugins, Maker-Checker Sub-agents, Durable State) Mapped Onto Claude Code's Existing Feature Set

A primary-source deep dive on Loop Engineering, the June 2026 AI-engineering trend named and codified by Google Chrome DevRel lead Addy Osmani in his "Loop Engineering" blog post and elevated to industry attention by Anthropic Claude Code lead Boris Cherny's quote — "I don it prompt Claude anymore. I have loops running that prompt Claude and figuring out what to do. My job is to write loops." (reported by The New Stack). Covers the four-generation lineage: Prompt Engineering (2022-2024) → Context Engineering (2025, coined by Shopify CEO Tobi Lütke, formalized in Anthropic's Effective Context Engineering for AI Agents) → Harness Engineering (early 2026) → Loop Engineering (June 2026 onwards). Grounded in Peter Steinberger's seed phrase — "you should be designing loops that prompt your agents" — the column maps out the six building blocks: (1) Automations / Trigger (timer- or event-driven heartbeats), (2) Worktrees (isolated git checkouts to prevent parallel sub-agent collisions), (3) Skills (SKILL.md / CLAUDE.md to externalize intent and reduce "intent debt"), (4) Plugins / Connectors via MCP (execution permissions), (5) Maker / Checker Sub-agents (separating generation from verification), and (6) Durable State (memory belongs on disk, not in context). Explains Inner Loop vs Outer Loop, how Claude Code's `/goal`, Automations, Worktrees, Skills, and Sub-agents constitute a ready-made Loop Engineering toolkit, the surge of Japanese coverage on Qiita / Zenn / DevelopersIO / note / OptiMax, and the five major risk vectors: Cognitive Surrender (Osmani's central warning), Loop Brittleness, Verifier mis-grading, HITL approval fatigue, and runaway-loop cost explosion.


TL;DR — Loop Engineering in One Sentence

In June 2026, a new term — Loop Engineering — took hold across the AI-engineering community. The catalyst was a "Loop Engineering" blog post from Google Chrome DevRel lead Addy Osmani, amplified by Anthropic Claude Code lead Boris Cherny's quote (reported by The New Stack):

> "I don't prompt Claude anymore. I have loops running that prompt Claude and figuring out what to do. My job is to write loops."

Three takeaways:

1. Latest in the lineage: Prompt Engineering (2022-2024) → Context Engineering (2025) → Harness Engineering (early 2026) → Loop Engineering (June 2026 onwards) 2. Mindset shift: From optimizing a single LLM call to designing the whole system in which an LLM, tools, environments, and humans loop together 3. Already implementable on Claude Code: Automations / Worktrees / Skills / Sub-agents / `/goal` form a ready-made toolkit when reinterpreted through this lens

Origin — Who Coined It, When, and Where

Named by: Addy Osmani (Google Chrome DevRel lead, author of *Learning JavaScript Design Patterns*). Codified in his June 2026 "Loop Engineering" post.

Seed phrase: From iOS-era developer Peter Steinberger"you should be designing loops that prompt your agents" — picked up and conceptualized by Osmani, Greyling, and others.

Authority moment: Anthropic Claude Code lead Boris Cherny's quote (above) was the decisive endorsement. With the person who actually builds Claude Code saying "I don't prompt anymore; I write loops," the term shot into common usage.

Current status: Anthropic / OpenAI / Cognition have not yet officially adopted the term in product docs. But a coherent definition is already shared across multiple primary and secondary sources, and the term has become the interpretive frame for an existing feature set rather than a new product line.

Four Generations — Prompt → Context → Harness → Loop

GenerationFocusEraAnchor
Prompt EngineeringHow you speak to the model (few-shot, CoT, role-play)2022-2024OpenAI Cookbook, PromptingGuide.ai
Context EngineeringWhat you show the model (history, RAG, state, memory)2025Coined by Shopify CEO Tobi Lütke; crystallized in Anthropic's Effective Context Engineering
Harness EngineeringScaffolding for a single session (tools, constraints, feedback)Early 2026Karpathy-adjacent "Software 3.0" discourse
Loop EngineeringWho orchestrates whom, when, and how oftenJune 2026 onwardsAddy Osmani, Boris Cherny, Cobus Greyling

Prompt → Context is a shift from *what to say* to *what to show*. Context → Harness is from *what to show* to *how to support*. And Harness → Loop is from *how to support* to who orchestrates whom and when — a real expansion of the temporal axis and system boundary.

Definition (Synthesized Across Sources)

Addy Osmani: "Loop engineering is replacing yourself as the person who prompts the agent. You design the system that does it instead."

Cobus Greyling (Substack): designing an autonomous system that discovers work, dispatches it to sub-agents, verifies the result, persists state, and schedules the next action — or decides the goal is achieved.

explainx.ai (What Is Loop Engineering?): "What system should I build so the agent finds the work, does it, verifies it, and remembers what it did — without me in the loop at all?"

Common threads: the human stops being the prompter; the agent runs itself; verification and state persistence are built in; termination and next-action decisions are automated.

Inner Loop vs Outer Loop — Where Loop Engineering Acts

The Inner / Outer distinction is everywhere in Loop Engineering discussions:

Inner Loop (classical ReAct): reason → act → observe → repeat, within one session. The Reflexion / ReAct / CodeAct family. A single LLM + tool-call cycle.

Outer Loop (what Loop Engineering operates on):

- Timer / event triggers (Automations) - Sub-agent spawn (Maker / Checker separation) - Result verification (Verifier) - State persistence (STATE.md / JSON / Linear / Jira) - Next-action decision (goal reached or continue)

Loop Engineering is therefore not Inner-Loop optimization but the design of the Outer Loop that drives, monitors, and terminates the Inner Loop (Oracle Developers — "The Agent Loop Decoded: Three Levels" lays out a three-tier model).

Six Building Blocks (the Osmani / Greyling Canon)

1. Automations / Trigger — the heartbeat. Scheduled (cron) or event-driven (PR opened, Slack mention, file change) auto-start. Claude Code Automations, Cursor's Agents Window, the `claude-code` GitHub Action. No human pushing a "run" button is the actual point.

2. Worktrees — collision-free parallelism. When multiple sub-agents touch code at once, each one gets an isolated git worktree. Without this, parallel loops collapse into merge conflicts within minutes. Claude Code's Workflow already ships an `isolation: "worktree"` option.

3. Skills (SKILL.md / CLAUDE.md) — externalize intent. Project-specific knowledge (naming, test policy, deploy procedure) belongs in repo-level files, not in every prompt. Osmani: "the more shared context you don't have to restate, the more stable the loop." This is what he calls reducing intent debt.

4. Plugins / Connectors via MCP — granting side-effect authority. Opening PRs, updating tickets, triggering deploys. Without these, the agent can only propose and the loop never closes.

5. Maker / Checker Sub-agents — separation of generation and verification. Anthropic's "How we contain Claude" captured the problem in one line: "the model is too kind to grade its own homework." Generation and verification must be separate agents — separate models, ideally. This is the design choice that drove FrontierCode's strongest published scores.

6. Durable State — memory lives on disk, not in the context window. Long-running loops break the moment you try to use the context window as memory. State belongs in STATE.md, JSON, a Linear / Jira board, or Postgres — read back on each cycle. mem0.ai's "memory-first design" is the canonical articulation.

Who's Pushing the Idea

- Boris Cherny (Anthropic / Claude Code lead) — authority source (The New Stack) - Peter Steinberger — the seed phrase - Addy Osmani (Google Chrome DevRel) — the naming and codification (addyosmani.com) - Cobus Greyling — the Playbook series on Medium / Substack - Anthropic Engineering Blog"How we contain Claude" provides the containment vocabulary - Steve Kinney"The Anatomy of an Agent Loop" - mem0.ai — memory-first articulation

Recurring Patterns

- Plan-Act-Observe-Reflect — the Inner-Loop verb chain; successor to ReAct / Reflexion - Self-correcting Loop — observe a failing test, fix immediately; CI breaks → auto-revert - Critic / Verifier (Maker-Checker) — a separate agent inspects, retry until passing - Sandbox Loop — Docker / worktree isolation, rollback-first risk tolerance - Eval-driven Loop — continuous improvement against offline benchmarks (SWE-bench Verified, FrontierCode) - HITL Loop — human review as the brake; pause for PR approval - Workflow Pipeline — express complex flows via Claude Code Workflow / Phase / Sub-agents

Frameworks and Tools

ToolRole as Loop-Engineering toolkit
Claude Agent SDK / Claude CodeAutomations / Worktrees / Skills / Sub-agents / `/goal` — full kit
LangGraphStateGraph as loop primitive (branching, cycles)
OpenAI Agents SDKSuccessor to Swarm — handoffs, sessions, guardrails
Cognition Conductor / DevinLong-running tasks integrated with FrontierCode
Vercel AI SDK loopsTypeScript-native loop primitives
Pydantic AI / InstructorTyped loops (output schema enforcement)
Microsoft AutoGen 2.xMulti-agent conversation standard

Claude Code is, de facto, the reference implementation in current discussion.

Benchmarks and Evaluation

Evaluation has shifted from one-shot benches (HumanEval, MBPP) to agent-loop benches (Cognition FrontierCode, SWE-bench Verified, SWE-bench Pro). New axes for measuring loop quality:

- Termination accuracy — infinite-loop suppression, fail-fast performance - Token efficiency — reducing redundant reasoning - Cost efficiency — dollars-per-outcome - Verifier accuracy — Checker sub-agent FP / FN rates - Recovery — restart / state-restore robustness after failure

Japanese Coverage (June 2026)

Qiita / Zenn / DevelopersIO / note coverage exploded in mid-June 2026. Representative articles:

A bilingual PDF guide — "loop-engineering-orange-book" — has appeared on GitHub from the Chinese-language community.

Risks — The Five Big Concerns

Behind the enthusiasm, the same five risks turn up in every serious treatment:

1. Runaway loop / cost explosion. Loose termination conditions compound token use exponentially. With Claude Opus 4.8 output at $75/1M, a runaway loop can hit hundreds of dollars in hours. Hard limits (max iterations / max cost / max duration) are mandatory.

2. Verifier mis-grading. "The model is too kind to grade its own homework." Same model generating and verifying = optimistic self-grading. Use separate agents — ideally separate models — for generator and verifier. Aim for a near-perfect verifier.

3. HITL approval fatigue. Too many notifications → mechanical clicks → the human brake stops working. Approval-granularity design matters — too fine and too coarse both break.

4. Loop brittleness in production. Benchmarks look great; production edge cases collapse. This is the gap FrontierCode was designed to measure.

5. Cognitive surrender. Osmani's central warning. Generation outruns comprehension; engineers approve code they haven't read; intent debt accumulates; the codebase becomes unmaintainable. Osmani repeats: "loops shouldn't go faster than you can understand them."

Putting It Into Practice — Oflight's Recommended Steps

What we recommend in our AI consulting and software development practice:

Step 1 — Run the Inner Loop on one project. A single small task driven by Claude Code `/goal` or a normal session, with observation logs. Feel where it goes off the rails before scaling.

Step 2 — Separate Maker from Checker. Use Workflow to split generation and verification. Default the verifier to *refute* mode; retry until pass.

Step 3 — Introduce Durable State. Externalize loop state into STATE.md / Linear / Jira / Notion. Stop using the context window as memory.

Step 4 — Automations (timer / event triggers). GitHub Actions, cron, Slack events. Remove human "run" buttons one by one.

Step 5 — Hard limits + HITL gates. Max iterations, max cost, max duration. Keep HITL gates on critical side effects (production deploys, billing, external API).

Step 6 — Eval-driven improvement. Continuous evaluation against an independent benchmark (e.g., FrontierCode). Quantify cost efficiency, termination accuracy, verifier accuracy over time.

Where It Sits Among Adjacent Concepts

ConceptRelationship
Prompt EngineeringOlder generation; survives as individual steps inside Loop Engineering
Context EngineeringPrevious generation; corresponds to Inner-Loop input design
Harness EngineeringSingle-session Loop Engineering (no Outer Loop)
Agent Loop (Anthropic)Synonym for Inner Loop; internal Claude Code step
Inner / Outer Loop (Devin)Distinction Loop Engineering adopted
ReAct / Reflexion / CodeActCanonical Inner-Loop papers
Plan-Act-Observe-ReflectThe verb form of an Inner Loop
AgentOps / LLMOpsOperations / observability side of Loop Engineering
Workflow EngineeringLangGraph / Temporal vocabulary; overlapping concept

FAQ

Q1. Is Loop Engineering just Prompt Engineering rebranded? A. No. Prompt Engineering optimizes a single LLM call's input. Loop Engineering designs the Outer Loop system that orchestrates many calls. Different granularity, different time horizon. Q2. Has Anthropic officially adopted the term? A. Not as of June 22, 2026. Boris Cherny's quote (The New Stack) is a personal endorsement, and Anthropic's "How we contain Claude" covers the containment vocabulary — but the product docs don't yet use the term. That said, Claude Code's Automations / Worktrees / Skills / Sub-agents are functionally the Loop-Engineering toolkit. Q3. Relationship to LangGraph / OpenAI Agents SDK? A. Loop Engineering is the paradigm; LangGraph and Agents SDK are implementations. LangGraph's StateGraph handles Outer-Loop transitions; sub-agents implement Maker/Checker; checkpoints provide Durable State. Q4. The Japanese translation? A. "ループエンジニアリング" is essentially the agreed translation across Qiita / Zenn / DevelopersIO / note / OptiMax. Q5. How do you prevent runaway-loop cost? A. Three layers of hard limit: max iterations (e.g., 100 steps), max cost (e.g., $10 / loop), max duration (e.g., 30 minutes). Whichever trips first stops the loop and pings a human. Claude Code Workflow ships a `budget` parameter for exactly this. Q6. How to counter verifier mis-grading by the same model? A. Four standard moves: (1) use a different model (Claude generates, GPT-5.5 or Gemini 3.1 Pro verifies), (2) adversarial prompts that default to *refute*, (3) majority vote across multiple verifiers, (4) multi-perspective verifiers (correctness / security / performance). Q7. How do you avoid Cognitive Surrender? A. (1) Carve out time to actually read the generated code, (2) require human final approval on PRs, (3) record explainability (why this change) into SKILL.md / STATE.md, (4) deliberately slow the loop down — "don't go faster than you can understand," as Osmani repeats. Q8. Relationship to Forward Deployed Engineer (FDE)? A. FDE is the person who builds the loop on-site with the customer. Loops have to be tuned to organization-specific work and can't be assembled from a generic framework alone. FDE and Loop Engineering are two sides of the same shift.

Bottom Line

Loop Engineering is the AI-engineering vocabulary that landed in June 2026. Addy Osmani gave it the name; Anthropic's Boris Cherny — "I don't prompt Claude anymore; I write loops" — gave it authority. It sits at the leading edge of the Prompt → Context → Harness → Loop lineage and embodies the shift from optimizing one LLM call to designing the entire system in which LLMs, tools, environments, and humans loop together.

Operationally, Claude Code's existing feature set is the de-facto reference implementation — and a viable minimum adoption stack is Maker-Checker separation + Durable State + Hard Limits + HITL gates. The five risks to manage are runaway loops, verifier mis-grading, HITL fatigue, Loop brittleness, and cognitive surrender — with the last being Osmani's most insistent warning.

Outlook: When Anthropic / OpenAI / Cognition formally adopt the term in product docs, it locks in as the industry vocabulary. Through the second half of 2026, expect ecosystem expansion: independent benchmarks of loop quality (in the FrontierCode lineage), loop debuggers (a DevTools-style category), and loop marketplaces (industry-specific standard patterns to share).

References

Naming and ideas: - Addy Osmani — Loop Engineering - The New Stack — Loop Engineering / Boris Cherny - Cobus Greyling — Loop Engineering (Substack) - Cobus Greyling — Loop Engineering Playbook (Medium) - explainx.ai — What Is Loop Engineering? - Lushbinary — Loop Engineering Guide Technical: - Anthropic — Effective Context Engineering for AI Agents - Anthropic — How we contain Claude - Oracle Developers — Agent Loop Decoded - Steve Kinney — Anatomy of an Agent Loop - mem0.ai — Memory-First Design - Callsphere — Inside Claude Code's Agent Loop Japanese-language coverage: - Qiita y-morimatsu - Qiita Simon_Zhang - Qiita Syoitu - Zenn suwash - Zenn ino_h - Zenn acrosstudioblog - DevelopersIO - note MAKE A CHANGE - OptiMax - GitHub — loop-engineering-orange-book Related columns: - Claude Code Agent View parallel orchestration - Cognition FrontierCode benchmark - cmux (Manaflow) - Cursor Automations / Agents Window - Forward Deployed Engineer (FDE) - Hermes Agent Skills/Tools - Kimi K2.7-Code - Claude Agent SDK Credit Billing Change

Feel free to contact us

Contact Us