Prompt vs Context vs Harness Engineering: The 3 Shifts
AI engineering moved from phrasing prompts to supplying information to controlling execution. What each layer solves and why each hit a ceiling.
Course outline · AI Agents (4.3)
Three terms in two years: Prompt Engineering, Context Engineering, Harness Engineering. It looks like fashion. It's actually a ratchet. Each term took over when task complexity broke the previous one - and each corresponds to a progressively harder question:
- Does the model understand what you're asking?
- Does the model have the right information?
- Does the model keep doing the right thing across a long, real execution?
Trace the ratchet and you understand how AI systems went from "can chat" to "can ship." Miss it and you're optimizing the wrong layer - polishing prompts when your problem is information supply, or tuning retrieval when your problem is nobody's checking the output.

Shift 1: Prompt Engineering - Say It Better
The founding observation, circa GPT-3: same model, different phrasing, wildly different output. "Summarize this article" gets you mush; "As a senior tech editor, summarize in three paragraphs - core claim, evidence, limitations, max 150 words each" gets you something publishable. The toolkit crystallized fast: role assignment, few-shot examples, step-by-step decomposition, output format contracts, refusal boundaries.
Why it works is worth being precise about: an LLM is a context-sensitive probability machine. A role shifts the sampling distribution toward that persona's training data. Examples establish a pattern to continue. Constraints raise the weight of compliance. Prompting isn't commanding - it's shaping the probability space the answer gets drawn from. The skill that mattered was language design: knowing the model's temperament.
The ceiling: prompts can't conjure facts. "Analyze our internal architecture doc" fails on phrasing-perfect prompts if the doc isn't there. Prompt engineering solves the expression problem. It cannot solve the information problem - and most real tasks are information problems wearing expression-problem costumes. The moment work shifted from open-ended Q&A to "do something with my data," the center of gravity moved.
Free AI Builder Newsletter
Weekly guides on AI tools & builder strategies.
Shift 2: Context Engineering - Feed It Better
New default assumption: the model probably doesn't know - the system must deliver the right information at call time. And the question set changed shape entirely: What does the model currently see? What's missing? What should be summarized versus quoted versus excluded? What does this module need to see that that one shouldn't?
What forced the shift was agents. A chat turn is one prompt; an agent run is ~50 tool calls, each spraying results, errors, and state into a finite window. Context became a managed resource with real failure modes - rot, poisoning, distraction, clash - and a real discipline grew around managing it.
Two landmark practices define the era. RAG answered "how do facts the model never trained on get in?" - retrieve, rank, inject, with all the craft living in chunking and reranking. Agent Skills answered the subtler capability-overload problem with progressive disclosure: a ~50-token metadata layer always loaded, ~500-token instructions loaded on trigger, scripts and references loaded only at execution. Need-to-know, applied to machine attention.
Worth saying plainly: context engineering contains prompt engineering - the prompt is just one (curated) object in the window. The layers nest; they don't compete.
The ceiling: perfect inputs, unsupervised execution. The model has every fact and still: plans well then drifts at step 7, misreads a tool result and builds on the misreading, errors at step 3 and compounds it through step 30, reports confident completion on work that doesn't run. Input quality was never the whole game - because nobody was watching the work happen.
Shift 3: Harness Engineering - Control the Run
The word is literal: a harness is the rigging that turns an animal's raw power into directed, recoverable work. LangChain's definition is the cleanest in circulation:
Agent = Model + Harness. Harness = Agent − Model.
Everything around the weights: what the model sees (context), what it can touch (tools), how steps sequence, what persists (state and memory), who checks the output (evaluation), and what happens at failure (constraints, recovery). It's the answer to questions the first two layers never asked: who supervises? who verifies? who pulls it back on course?
The hiring analogy lands best. You brief a new hire on an important client visit (prompt). You hand them the account history and pricing sheet (context). But if the meeting matters, you also send a checklist, require a call at key milestones, review the recording after, and check results against criteria. That last bundle - nobody skips it for high-stakes work with humans. Harness engineering is refusing to skip it for agents.
And the receipts arrived fast. The detailed case studies live in the production practices guide, but the headline: OpenAI ran a near-million-line production app where agents wrote 100% of the code and the humans engineered the environment; Anthropic got Claude running unattended for hours via fresh-context resets and independent evaluator agents.
Top 30 → Top 5
LangChain's Terminal Bench jump - same model, only the harness changed. Same weights, different rigging, different league.
The Ratchet, In One Table
| Prompt Eng. | Context Eng. | Harness Eng. | |
|---|---|---|---|
| Object | The instruction | The input environment | The execution system |
| Core question | Did I say it clearly? | Does it see the right info? | Does it keep doing it right? |
| Failure it fights | Misunderstanding | Missing/noisy knowledge | Drift, error compounding, false "done" |
| Era trigger | GPT-3 chat | RAG + early agents | Long-running autonomous work |
| Skill that matters | Language design | Information architecture | Systems + verification design |
Each layer contains the previous: the prompt is an object inside the context; the context pipeline is a subsystem inside the harness. Which is why nothing here is obsolete - a sloppy prompt still hurts inside the best harness ever built. The layers are floors of one building, and complexity is the elevator.
Diagnostic, for daily use: output misunderstands the ask → prompt problem. Output is fluent but wrong or stale → context problem. Output starts right and degrades across steps, or claims success falsely → harness problem. Most "the model is bad" complaints are a mislabeled floor.
The arc of all three shifts compresses to one sentence: the engineering moved from talking to the model, to informing the model, to building the machine around the model - and the next article breaks that machine into its six load-bearing components. The model is the engine. Engines don't win races. Cars do.
Continue Learning
Mastering AI Agents
The builder's deep dive into agent loops, tools, context engineering & memory — from using AI to building it.
AI Agent 101
Build autonomous research agents with tool use, API access, web scraping, and deep search.
Cursor Prompt Templates
Scaffold auth and payment logic instantly with reusable Cursor prompt templates.
AI Builder Club
Courses, workshops, and a builder community for shipping with AI agents, Claude Code, and more.
Get the free newsletter
Weekly deep-dives on AI tools, automation workflows, and builder strategies. Join 5,000+ readers.
No spam. Unsubscribe anytime.