#ai-agents#harness-engineering#context-engineering#prompt-engineering#loop-engineering

Prompt vs Context vs Harness vs Loop Engineering: The 4 Shifts

AI engineering moved from prompts to context to harnesses to loops that run without you. What each layer solves and why each hit a ceiling.

ShirleyJune 11, 2026Updated July 2, 20268 min read

Course outline · Build AI Agents (4.3)

Four terms in three years: Prompt Engineering, Context Engineering, Harness Engineering, and now Loop Engineering. It looks like fashion. It's actually a ratchet. Each term took over when task complexity broke the previous one - and each corresponds to a progressively harder question:

Does the model understand what you're asking?
Does the model have the right information?
Does the model keep doing the right thing across a long, real execution?
Does the work keep happening when you're not there to ask?

Trace the ratchet and you understand how AI systems went from "can chat" to "can ship." Miss it and you're optimizing the wrong layer - polishing prompts when your problem is information supply, or tuning retrieval when your problem is nobody's checking the output.

Shift 1: Prompt Engineering - Say It Better

The founding observation, circa GPT-3: same model, different phrasing, wildly different output. "Summarize this article" gets you mush; "As a senior tech editor, summarize in three paragraphs - core claim, evidence, limitations, max 150 words each" gets you something publishable. The toolkit crystallized fast: role assignment, few-shot examples, step-by-step decomposition, output format contracts, refusal boundaries.

Why it works is worth being precise about: an LLM is a context-sensitive probability machine. A role shifts the sampling distribution toward that persona's training data. Examples establish a pattern to continue. Constraints raise the weight of compliance. Prompting isn't commanding - it's shaping the probability space the answer gets drawn from. The skill that mattered was language design: knowing the model's temperament.

The ceiling: prompts can't conjure facts. "Analyze our internal architecture doc" fails on phrasing-perfect prompts if the doc isn't there. Prompt engineering solves the expression problem. It cannot solve the information problem - and most real tasks are information problems wearing expression-problem costumes. The moment work shifted from open-ended Q&A to "do something with my data," the center of gravity moved.

Shift 2: Context Engineering - Feed It Better

New default assumption: the model probably doesn't know - the system must deliver the right information at call time. And the question set changed shape entirely: What does the model currently see? What's missing? What should be summarized versus quoted versus excluded? What does this module need to see that that one shouldn't?

What forced the shift was agents. A chat turn is one prompt; an agent run is ~50 tool calls, each spraying results, errors, and state into a finite window. Context became a managed resource with real failure modes - rot, poisoning, distraction, clash - and a real discipline grew around managing it.

Two landmark practices define the era. RAG answered "how do facts the model never trained on get in?" - retrieve, rank, inject, with all the craft living in chunking and reranking. Agent Skills answered the subtler capability-overload problem with progressive disclosure: a ~50-token metadata layer always loaded, ~500-token instructions loaded on trigger, scripts and references loaded only at execution. Need-to-know, applied to machine attention.

Worth saying plainly: context engineering contains prompt engineering - the prompt is just one (curated) object in the window. The layers nest; they don't compete.

The ceiling: perfect inputs, unsupervised execution. The model has every fact and still: plans well then drifts at step 7, misreads a tool result and builds on the misreading, errors at step 3 and compounds it through step 30, reports confident completion on work that doesn't run. Input quality was never the whole game - because nobody was watching the work happen.

Free AI Builder Newsletter

Weekly guides on AI tools & builder strategies.

Shift 3: Harness Engineering - Control the Run

The word is literal: a harness is the rigging that turns an animal's raw power into directed, recoverable work. LangChain's definition is the cleanest in circulation:

Agent = Model + Harness. Harness = Agent − Model.

Everything around the weights: what the model sees (context), what it can touch (tools), how steps sequence, what persists (state and memory), who checks the output (evaluation), and what happens at failure (constraints, recovery). It's the answer to questions the first two layers never asked: who supervises? who verifies? who pulls it back on course?

The hiring analogy lands best. You brief a new hire on an important client visit (prompt). You hand them the account history and pricing sheet (context). But if the meeting matters, you also send a checklist, require a call at key milestones, review the recording after, and check results against criteria. That last bundle - nobody skips it for high-stakes work with humans. Harness engineering is refusing to skip it for agents.

And the receipts arrived fast. The detailed case studies live in the production practices guide, but the headline: OpenAI ran a near-million-line production app where agents wrote 100% of the code and the humans engineered the environment; Anthropic got Claude running unattended for hours via fresh-context resets and independent evaluator agents.

Top 30 → Top 5

LangChain's Terminal Bench jump - same model, only the harness changed. Same weights, different rigging, different league.

The ceiling: a perfect harness still controls one run - and you are still the trigger for every run. You decide when to start, you read the result, you decide what to do next. The agent got supervised; the workflow didn't get autonomous. For a single high-stakes task that's fine. But the moment you have recurring work - triage every new issue, refresh the SEO pages weekly, keep the test suite green - you become the cron job. The bottleneck moved one last time: from the machine's execution to your attention.

Shift 4: Loop Engineering - Remove Yourself From the Run

The fourth shift got its name in June 2026, and unusually, you can watch it happen in real time. Peter Steinberger (OpenClaw's creator) posted that you shouldn't be prompting coding agents anymore - you should be designing the loops that prompt them - and the post crossed 6.5 million views in days. Boris Cherny, who created Claude Code, described his own workflow the same way: "I don't prompt Claude anymore. I have loops that are running... My job is to write loops." Addy Osmani gave the pattern its name, and within three weeks Andrew Ng had mapped his own three nested loops for 0-to-1 product building. When the people building the tools and the person who taught half the industry ML converge on one frame in a month, the frame is load-bearing.

The definition is compact: loop engineering is designing the system that prompts, checks, remembers, and re-runs the agent - so you don't have to. The prompt still exists; a machine writes it now. The harness still runs; a loop decides when it runs and whether the result is done. Instead of issuing instructions, you define a goal with a testable termination condition, a verifier that decides "good enough," a trigger (cron, webhook, another agent), and an exit for failure. Then you walk away.

The cleanest way to keep the last two layers straight: the loop defines what the agent does and when; the harness defines where it runs, what it can touch, and how it recovers. The loop is the choreography, the harness is the stage. You need both - a loop without a harness is an unattended agent with root access, and a harness without a loop is a very safe system waiting for you to press the button.

One honest caveat, because this layer is young: it's the first shift where the failure mode is measured in dollars, not just quality. An unattended loop with a weak verifier doesn't produce a bad answer - it produces bad answers all night, at token prices. Uber reportedly capped engineers at $1,500/month for agent tooling after burning its annual AI budget in four months. The skeptics calling this "a while loop with a marketing budget" are wrong about the substance but right about the risk: the whole discipline collapses without halt conditions and cost control. The full treatment - open vs closed loops, why the verifier is the bottleneck, and a starting checklist - lives in the loop engineering guide.

The Ratchet, In One Table

	Prompt Eng.	Context Eng.	Harness Eng.	Loop Eng.
Object	The instruction	The input environment	The execution system	The recurring system
Core question	Did I say it clearly?	Does it see the right info?	Does it keep doing it right?	Does it run without me?
Failure it fights	Misunderstanding	Missing/noisy knowledge	Drift, error compounding, false "done"	Human-as-bottleneck, runaway cost
Era trigger	GPT-3 chat	RAG + early agents	Long-running autonomous work	Cheap agents + June 2026 naming
Skill that matters	Language design	Information architecture	Systems + verification design	Goal, verifier + termination design

Each layer contains the previous: the prompt is an object inside the context; the context pipeline is a subsystem inside the harness; the harness is the run inside the loop. Which is why nothing here is obsolete - a sloppy prompt still hurts inside the best loop ever built. The layers are floors of one building, and complexity is the elevator.

Diagnostic, for daily use: output misunderstands the ask → prompt problem. Output is fluent but wrong or stale → context problem. Output starts right and degrades across steps, or claims success falsely → harness problem - usually a weak verifier, the half of the loop that decides "good enough." Output is fine but nothing happens unless you personally kick it off → loop problem: you're the cron job. Most "the model is bad" complaints are a mislabeled floor.

The arc of all four shifts compresses to one sentence: the engineering moved from talking to the model, to informing the model, to building the machine around the model, to designing the system that runs the machine without you - and the next articles break that down: the harness into its six load-bearing components, the loop into its verifier-first design. The model is the engine. Engines don't win races. Cars do. And in 2026, the cars started driving their own laps.

The fifth shift is already forming. In mid-July 2026 the discussion moved one rung further out: once you're good at running one agent in a loop, the next step is coordinating many of them as nodes in a graph. That layer is graph engineering. If you'd rather see these shifts as a clean, current map than a history, The 5 Layers of AI Engineering lays out the whole stack - prompt, context, harness, loop, graph - and helps you find the one layer that's actually your bottleneck.

Open source · free

AI Builder Club Skills

The harness and loop tooling from the last two shifts is open and free. /setup-codebase-harness and /new-loop get you running in one command each.

View on GitHub →

Sources & Verification

This guide is written from hands-on testing, then cross-checked against primary sources - official documentation and first-party announcements. Field results and opinions are labeled as such. See our editorial standards.