#ai-agents#loop-engineering#evaluation#production#advanced

Loop Engineering Guide (2026)

Loop engineering is the 2026 shift from prompting agents to designing the loops they run. Why the verifier is the bottleneck, open vs closed loops, and a copy-paste checklist.

10 min read
Course outline · Build AI Agents (4.6)

Loop engineering is the discipline of designing the loop an agent runs inside - what it does between tool calls, when it checks its own work, and how it decides it's finished - instead of hand-writing each prompt. It's the 2026 successor to prompt engineering. The model writes the prompts now. The scarce skill is defining what "good" and "done" mean, and the part almost every explainer skips is this: in any loop, the verifier is the bottleneck, not the model.

A few mornings ago I typed one sentence into Claude Code and went to make coffee. By the time I got back, something like forty agents had spun up, written code, checked each other's work, thrown away the bad attempts, and left me three pull requests to review.

I didn't write forty prompts. I wrote one loop.

If you've felt the ground shift under "prompt engineering" lately, this is why. Here's the decode - not the hype version, the version you can use this afternoon.


How Did We Get From Prompt Engineering to Loop Engineering?

It helps to see the line this sits on:

YearEraWhat you doYour role
2024Prompt engineeringWrite a good prompt, get a good outputOperator
2025Parallel agentsStop babysitting one chat, run several at onceManager
2026Loop engineeringBuild the loop that runs the agents for youSystem designer

The people behind Claude Code have said the quiet part out loud: some mornings they aren't really writing the prompts anymore - another model is - and they're managing hundreds, even thousands, of agents at a time. That sounds like a flex until you see what it implies: if you're not in the loop pressing enter between every step, something else has to decide when the work is good enough. That something else is the whole game.


What Is a Loop, Exactly?

Strip the jargon and a loop is four moves on repeat:

discover → plan → execute → verify → (repeat until a condition is met)

You used to be the loop. You were the thing standing between the agent's steps, reading the output, catching the mistake, deciding what happens next, telling it to try again. Loop engineering is the discipline of stepping out of that inner cycle and up to designing the track the agent runs on.

Here's the simplest possible version. Give the agent a goal and a stopping condition, and let it run:

text
Goal: get the test suite passing.
Loop: run the tests, read the failures, fix the most likely cause, run again.
Stop when: all tests green, or you've tried 6 rounds (then summarize what's left).

That's a loop. And it genuinely works on a simple task. But point a loop at something open-ended - "improve this app," "make this page better," "research X and write it up" - and it either produces something great or it quietly turns into a very expensive slop machine. The difference between those two outcomes is the part nobody talks about.


Why Is the Verifier the Bottleneck, Not the Generator?

Every loop has two halves. The generator produces work - that's the model, and models are now extremely good. The verifier judges whether that work is good. Put it plainly: a loop is just a generator wired to a verifier, and the generator was never the bottleneck. The verifier is.

For two years we obsessed over the generator. We tuned prompts, swapped models, argued about temperature. But in a loop, the generator runs over and over for nearly free. The thing that decides whether all that motion produces value is the verifier.

And the freer you let the loop run, the more everything rides on the verifier. A loop with a weak "good enough?" check doesn't fail loudly. It succeeds at producing garbage, confidently, hundreds of times.

This is the same shift Addy Osmani keeps pointing at: the bottleneck moved from writing code to proving it works. Review, judgment, taste, knowing what "correct" looks like - that's now the most leveraged skill an engineer has. It's the same lesson agent evaluation teaches at the production layer: a model grading its own homework always gives itself an A. In a loop-engineering world, your taste isn't a soft skill anymore. It's the reward function.


Open Loop vs Closed Loop: Which Should You Build?

Once you accept that the verifier is the point, the engineering decision gets clear. Every loop sits somewhere between two poles.

Open loopClosed loop
What it isGive a goal and loose conditions, let it explore a wide spacePin success criteria in advance, evaluate every step, define an explicit stop
UpsideGenuinely novel output. Surprising solutions live hereRuns on a normal budget. Predictable. Safe to leave alone
DownsideBurns tokens. Degrades into slop fast with loose criteriaWon't surprise you. Does what you specified, not more
Lives or dies byThe verifier (even more so)The verifier

The actual engineering is two decisions:

  1. Choose open or closed for this specific task. Roughly: how much do I need novelty, and how much budget am I willing to risk?
  2. Write the verifier that matches. A closed loop needs hard, checkable passes. An open loop needs an even better verifier, because it's the only thing standing between exploration and slop.

What Does a Loop With a Weak Verifier Actually Produce?

The most useful thing isn't a loop that works. It's the same task run two ways.

TAKE 1 - OPEN LOOP, WEAK VERIFIER

"Make this landing page better. Keep iterating." No definition of "better." It rewrites the hero eight times, each version different, none clearly better, all plausible. It reports success. You spent real money to get motion without progress. That's the slop machine - and almost everyone's first agent loop is exactly this.

TAKE 2 - CLOSED LOOP, EXPLICIT VERIFIER

Same task, but every iteration has to clear a bar you defined. Motion can only happen in the direction of "better." The loop converges instead of wandering.

Here's Take 2 written out:

text
Goal: improve landing-page conversion clarity.
Done when ALL pass:
  - Lighthouse accessibility score >= 95
  - Exactly one primary CTA above the fold
  - Hero headline states the value prop in <12 words
  - No layout shift (CLS < 0.1)
Loop: propose a change -> run the checks -> keep it only if every check still passes
      -> stop when all green or after 5 rounds.

Now the loop converges, because every iteration clears a bar you defined. Then comes the trick that rescues open loops too: keep those hard checks as the floor, and add one open instruction ("surprise me with the headline"). Now you get exploration that can't degrade below your standard.

The lesson isn't "closed loops are better." It's: the verifier is what makes either kind ship.


What Tools Can Run the Loop For You?

You don't have to build the loop machinery from scratch. There's a clear lineage, from rigid to autonomous:

Tool / patternHow it decides "done"Best for
"Ralph" loopA bare shell while loop re-fires the same prompt until you stop it - no smart "done," you are the stop buttonMechanical, repetitive tasks. Crude but bulletproof
Claude Code /goalA small fast model judges the stop condition after every turnFuzzy "done" that still needs evaluating
Goal-tracking setupsAgent tracks its own progress in files, defines "done" up frontLong runs that need to stay oriented
Self-hosted agents (Hermes-style)Runs continuously on its own infrastructure, keeps state across sessionsAlways-on agents you set up once and leave running

/goal is the cleanest entry point: you type a completion condition once, and after each turn a Haiku-class evaluator reads the transcript and decides yes or no, looping until it holds (or you hit your turn cap). One catch worth knowing - the evaluator only sees what Claude already printed, so your condition has to be provable from the agent's own output.

Pick the least autonomous tool that does the job. Autonomy is not the goal. A shipped result is the goal. When the work fans out wide and repetitive, push the loop into a script with dynamic workflows instead of babysitting it.


Why Writing a Verifier Is Like Defining a Reward Function

If you've touched reinforcement learning, this clicks instantly. In RL you don't tell the agent every move - you define the reward, and the agent iterates toward it on positive and negative signal.

That's exactly what you're doing here. You are not training the model. You are defining the reward: the end goal, and what counts as good. Your domain knowledge - knowing what correct looks like in your problem - is the moat. The model is a commodity. The reward function is yours.

This is also why the field keeps drifting toward harness engineering: the leverage isn't in the phrasing anymore, it's in the system around the model - context, tools, state, and the evaluation loop that decides when to stop. Loop engineering is that same move, named from the loop's point of view.

Which is why I keep saying it: writing the verifier is the new prompt engineering.


Free AI Builder Newsletter

Weekly guides on AI tools & builder strategies.

Is Loop Engineering Just a New Buzzword?

I'd be doing you a disservice not to flag the counter-take, because it's fair. Some sharp people point out that "loop engineering" is partly a fresh term coined to ride attention, and some of the content rushing to use it is repackaging things agent builders already knew.

So here's your filter. Ignore the word. Look at whether the underlying shift is real:

  • Are you increasingly designing systems that run agents, instead of prompting agents directly? Yes.
  • Is the scarce, valuable part now defining "done" and "good," rather than phrasing the request? Yes.
  • Does a loop with a weak verifier reliably produce expensive garbage? Also yes.

The label is optional. The shift is not. Whether you call it loop engineering or just "building agents that don't waste my money," the move is the same: stop perfecting prompts, start writing verifiers.


Your Loop Engineering Starting Checklist

If you build one loop this week, run it through this:

  1. Define "done" in measurable terms before you write a single instruction.
  2. Pin the passes - the tests, the rubric, the eval - up front, not after.
  3. Choose open vs closed by need-for-novelty x budget-you'll-risk.
  4. Always attach the run data to whatever you hand back to a human.
  5. Use the least autonomous harness that gets the result.

Do that and you've crossed the line from someone who prompts AI to someone who engineers the system that does the work. That's the whole skill. The rest is reps.


Frequently Asked Questions

What is loop engineering?

Loop engineering is the practice of designing the loop an AI agent runs inside - the discover, plan, execute, verify cycle - and defining its stop condition, instead of hand-writing each prompt. The human moves from operator to system designer. The core skill is writing the verifier that decides whether the agent's output is good enough.

How is loop engineering different from prompt engineering?

Prompt engineering optimizes a single request to a model. Loop engineering optimizes the system that runs the model repeatedly: what it does between tool calls, when it checks its work, and when it stops. As models got good enough to write their own prompts, the leverage moved up a level - to defining "good" and "done" rather than phrasing the ask.

What is a verifier in an agent loop?

The verifier is the half of a loop that judges whether produced work meets the bar. The generator (the model) produces; the verifier decides if it ships, retries, or stops. In a loop the generator runs cheaply over and over, so the verifier - not the model - is the bottleneck that determines whether all that motion produces value.

What's the difference between an open loop and a closed loop?

A closed loop pins success criteria in advance, checks every step, and has an explicit stop condition - predictable, budget-friendly, won't surprise you. An open loop is given a goal and loose conditions and allowed to explore - it produces novel output but burns tokens and degrades into slop without a strong verifier. Choose based on how much novelty you need versus how much budget you'll risk.

Why is the verifier more important than the model?

Because models are now strong and interchangeable, while the definition of "correct" for your specific problem is not. A weak verifier doesn't fail loudly - it confidently produces garbage hundreds of times. The verifier encodes your domain knowledge and taste, which is the part the model can't supply. It functions like a reward function: define it well and the loop converges on value.

How do I stop an agent loop from running forever?

Pin an explicit stop condition and a bound. Use measurable passes (tests green, an exit code, a score threshold) plus a hard cap like "or stop after 5 rounds." In Claude Code, the /goal command sets a completion condition that a fast model checks after each turn; adding a turn or time clause prevents infinite loops.



Start Here

Pick one repetitive task this week. Before you write a single instruction, write down what "done" means in measurable terms. Then pin the checks, choose open or closed, and let the loop run against your bar instead of the model's.

For closed-loop templates, verifier checklists, and teardowns of loops that shipped (and loops that burned money), join the AI Builder Club - come ship something real.

Join AI Builder Club

Sources & Verification

Firsthand: the workflow and the two-take landing-page example come from running real agent loops in Claude Code in mid-2026, not synthetic benchmarks - treat them as one builder's results, not guarantees. Product behavior (the /goal command and how it evaluates a stop condition) is verified against Anthropic's official docs below. See our editorial standards.

Join AI Builder Club

65+ lessons, 22+ workshops
350+ plug-and-play prompts & skills
Weekly live builder workshop
Premium tools (e.g. 10xCoder, AI tutor)
AI Builder Pack ($5,000+ in exclusive AI credits & perks)
1k+
Join 1,000+ builders already inside
Start shipping →30-day money-back · Cancel anytime

$37/mo

Get the free newsletter

Weekly deep-dives on AI tools, automation workflows, and builder strategies. Join 5,000+ readers.

No spam. Unsubscribe anytime.

Continue Learning