#ai-agents#loop-engineering#harness-engineering#production

Loop Engineering vs Harness Engineering: The Difference

Loop engineering decides what an agent does and when it stops. Harness engineering decides where it runs, what it can touch, and how it recovers. The boundary, the failure modes, and which to build first.

ShirleyJuly 2, 20267 min read

Short answer: loop engineering decides what the agent does and when it stops - the goal, the generate-verify cycle, the retry bound, the trigger. Harness engineering decides where it runs and what it can touch - the sandbox, the tools, the permissions, the state, the recovery. The loop is the choreography. The harness is the stage. They arrived as buzzwords a few months apart, which makes them look like competing fads - but they name different layers of one system, and confusing them is how you end up solving the wrong problem with the wrong tool.

The confusion is understandable. Both terms exploded within about six months of each other: harness engineering gained traction in late 2025 as OpenAI and Anthropic published how their production agents actually run, and loop engineering got its name in June 2026 when Peter Steinberger, Boris Cherny, and Addy Osmani converged on "stop prompting agents, design the loops that prompt them." Same season, same community, overlapping vocabulary. But ask each discipline its core question and they split cleanly:

Loop engineering asks: does the work keep happening - and does it know when to stop?
Harness engineering asks: is the environment it happens in safe, observable, and recoverable?

What Loop Engineering Owns

Loop engineering is the discipline of designing the cycle an agent runs - and, critically, its exit. Osmani's naming post lays out the anatomy: a specific goal with testable termination conditions, context management across iterations, explicit failure exits, and error handling that produces genuine adaptation rather than retries of the same failed approach. In practice, the loop engineer owns four decisions:

The goal - stated as a testable "done," not a vibe. "Tests green" is a goal. "Improve the code" is a prayer.
The verifier - the half of the loop that judges output. In any loop the generator runs cheaply, over and over, so the verifier is the bottleneck that decides whether all that motion produces value.
The bound - max N rounds, max budget, max wall-clock. An unbounded loop is not autonomous; it's unbilled.
The trigger - what wakes the loop: cron, webhook, another agent, a new issue. This is what makes the work recurring instead of one-shot.

Notice what's not on the list: nothing about sandboxes, permissions, or which tools exist. The loop engineer assumes an environment and choreographs behavior inside it. That's why Andrew Ng can describe his three loops - agentic coding, developer feedback, external feedback - without ever mentioning infrastructure. Loops are a design object independent of any particular stage.

What Harness Engineering Owns

The harness is everything around the model that turns raw capability into directed, recoverable work. LangChain's definition is the cleanest: Agent = Model + Harness. Harness = Agent − Model. The six load-bearing components: context management, the tool system, orchestration, state and memory, evaluation and observability, constraints and recovery. The harness engineer owns a different four decisions:

The boundary - what the agent can read, write, and reach. Filesystem scope, network allowlists, OS-level sandboxing.
The tools - which capabilities exist at all, how they authenticate, what their rate limits are.
The telemetry - traces, logs, and evals so you know what happened inside a run without re-reading every transcript.
The recovery - what happens at failure: checkpoints, rollbacks, fresh-context resets, escalation to a human.

Notice again what's not on the list: the harness doesn't know what "done" means for your task, and it doesn't decide when to run again. A perfect harness will faithfully and safely execute a pointless loop forever.

The Boundary, In One Table

	Loop Engineering	Harness Engineering
Core question	Does it run - and stop - without me?	Is the run safe, observable, recoverable?
Designs	Goal, verifier, bound, trigger	Boundary, tools, telemetry, recovery
Unit of work	The cycle (and its re-runs)	The environment (per run)
Fails as	Runaway cost, false "done", human-as-cron-job	Escaped writes, silent failures, unrecoverable state
Skill core	Defining "good" and "done"	Systems and infrastructure design
Named by	Osmani/Steinberger/Cherny, June 2026	OpenAI/Anthropic/LangChain, late 2025
Our deep dive	Loop engineering guide	Production harness guide

Structurally, the loop runs inside the harness - every iteration executes in the environment the harness provides. But they're separable design objects: the same loop can move across harnesses (your laptop's sandbox, CI, a cloud worktree), and one harness hosts many loops. That separability is exactly why two different job-title-shaped terms emerged instead of one.

Free AI Builder Newsletter

Weekly guides on AI tools & builder strategies.

What Happens If You Have One Without the Other

A loop without a harness is an unattended agent with unscoped access. It works - right up until the iteration where the agent decides the fastest path to "tests green" is deleting the failing tests, or the retry storm hits a production API 400 times. Every horror story in the reliability and cost guide is some version of this: the loop ran fine; nothing constrained what running meant. Uber's $1,500/month per-engineer cap is what it looks like when the harness layer gets built by the finance department instead.

A harness without a loop is a very safe system waiting for you to press the button. This is most production agent setups today: excellent sandboxing, curated tools, beautiful traces - and a human manually kicking off every run, reading every result, deciding every next step. Nothing is wrong, exactly. But you're the cron job, and the fourth shift hasn't happened for you yet.

The diagnostic version: runaway bills and false completions are loop bugs. Escaped writes and unexplainable failures are harness bugs. If your agent burned $80 overnight producing garbage, no amount of sandboxing would have saved you - the verifier and the bound were missing. If your agent modified a file outside the project, no verifier would have caught it - the boundary was missing.

Which Should You Build First?

Loop first - because you probably don't need to build a harness at all to start.

Off-the-shelf agent products ship serious harnesses now: Claude Code has sandboxing, permission modes, hooks, and tool scoping out of the box; the same is true across the major coding agents. Which means the highest-leverage work for most builders is loop work on top of a rented stage: write a testable goal, wire a verifier the agent can't skip, pin a retry bound, add a trigger. That's an afternoon, and it's the difference between prompting an agent and running a system.

Custom harness work earns its keep when the defaults stop fitting:

You need stricter isolation than the product default (regulated data, production credentials)
You're building custom tools with their own auth and rate limits
You need team-level observability - who ran what, what did it cost, what did it touch
Your loops run unattended at scale, where "read the transcript" stops being a monitoring strategy

The build order for a real project, then: (1) testable goal + verifier, (2) bound + trigger, (3) tighten the boundary to match the autonomy, (4) telemetry once more than one loop runs, (5) recovery once a failed run costs real money. Steps 1-2 are loop engineering. Steps 3-5 are harness engineering. You climb into the harness work as autonomy grows - not before.

Two Terms, One System

The names arrived in sequence, so it's tempting to read them as a fad replacing a fad - 2025 was "the year of agent harnesses," then June 2026 renamed everything "loops." The full four-shift arc shows why that reading is wrong: each term stuck because task complexity broke the previous layer. Harness engineering answered "does it keep doing the right thing across a long run?" Loop engineering answered the question that emerged the moment harnesses got good: "then why am I still the one starting every run?"

Production teams don't choose between them. The choreography needs a stage; the stage exists for the choreography. Learn to write the verifier, and rent the harness until you can't.

Loop Engineering: Stop Writing Prompts, Start Writing Verifiers - The full loop discipline: open vs closed loops, verifier design, and the starting checklist.
Harness Engineering: What OpenAI and Anthropic Changed - The production harness playbook from the teams that named it.
Harness: The 6 Components - Context, tools, orchestration, state, evaluation, recovery - the stage, piece by piece.
From Prompts to Loops: The 4 Shifts That Redefined AI Engineering - Where both terms sit in the larger ratchet.
AI Agent Reliability and Cost Control - What loop bugs cost in production, and the 7 levers that fix them.

Start Here

Take one agent task you run by hand. Write its "done" as something a script can check, add a bounded retry, and put it on a trigger - that's your first loop, on the harness you already have. When it runs unattended for a week without surprising you, then ask whether the stage needs work.

Frequently Asked Questions

What is the difference between loop engineering and harness engineering?

Loop engineering designs what the agent does and when it stops: the goal, the generate-verify cycle, the retry bound, and the trigger that re-runs it. Harness engineering designs where the agent runs and what it can touch: the sandbox, tools and permissions, state, observability, and recovery. The loop is the choreography; the harness is the stage.

Is the loop part of the harness, or the harness part of the loop?

Structurally, the loop runs inside the harness - the harness supplies the environment every loop iteration executes in. But design-wise they are separate objects: you can move the same loop onto a different harness (local sandbox to CI to a cloud worktree), and one harness can host many different loops.

Which should I learn first, loop engineering or harness engineering?

Loop first. A minimal loop - a testable goal, a verifier, a retry bound - delivers value on top of an off-the-shelf harness like Claude Code, which already ships sandboxing, permissions, and tools. You only need custom harness work when the defaults stop fitting: stricter isolation, custom tools, team-level observability, or compliance requirements.

Do I need both a loop and a harness?

For anything unattended, yes. A loop without a harness is an autonomous agent with unscoped access - it works until the day it modifies something it shouldn't. A harness without a loop is a very safe system that still waits for you to press the button every time. Autonomy needs the loop; safety at autonomy needs the harness.

Did loop engineering replace harness engineering?

No. The terms arrived a few months apart (harness engineering gained traction in late 2025, loop engineering in June 2026) which makes them look like successive fads, but they name different layers of the same system. Production teams do both: the harness papers from OpenAI and Anthropic and the loop framing from Osmani and Ng describe the same architecture from two sides.

Sources & Verification

Definitions are synthesized from the primary sources below (Osmani's naming post, LangChain's harness anatomy and loop-engineering essays, Andrew Ng's Batch letter) and cross-checked against our deeper guides on each discipline, plus a last-30-days scan of the X/Reddit/HN conversation (June-July 2026) where the two terms are actively being confused. The failure-mode examples are composites of patterns builders report, not benchmarks. See our editorial standards.

Loop Engineering: Designing loops that prompt coding agents (Addy Osmani) - The post that named loop engineering: goal, tools, context, failure exits, adaptive error handling
The Anatomy of an Agent Harness (LangChain) - Agent = Model + Harness; the components that make up a harness
The Art of Loop Engineering (LangChain) - The loop as the unit of agent behavior design, from the team that also defined the harness
My 3 key loops for building 0-to-1 products (Andrew Ng, The Batch) - Nested loops at different time scales - evidence that 'the loop' is a design object independent of any harness
awesome-harness-engineering (GitHub) - Community-curated harness scope: tools, patterns, evals, memory, MCP, permissions, observability, orchestration

Join AI Builder Club

✓65+ lessons, 22+ workshops

✓350+ plug-and-play prompts & skills

✓Weekly live builder workshop

✓Premium tools (e.g. 10xCoder, AI tutor)

✓AI Builder Pack ($5,000+ in exclusive AI credits & perks)

1k+

Join 1,000+ builders already inside

Start shipping →30-day money-back · Cancel anytime

$37/mo

Live workshop

Get the free newsletter

Weekly deep-dives on AI tools, automation workflows, and builder strategies. Join 5,000+ readers.

No spam. Unsubscribe anytime.

Continue Learning

Mastering AI Agents

The builder's deep dive into agent loops, tools, context engineering & memory — from using AI to building it.

AI Agent 101

Build autonomous research agents with tool use, API access, web scraping, and deep search.

← Back to Blog