The 5 Levels of AI Coding Agents: From Simple Code Review to Fully Autonomous Engineering
Level 1 runs a linter. Level 5 ships features end-to-end with no human in the loop. This guide breaks down each level with real examples, code, and the tools used at each stage — so you know exactly where to start.
What You'll Learn
By the end of this course, you will:
- Understand the 5 levels of coding agent autonomy — from single-call review bots to fully autonomous engineering agents
- Build Level 1–3 agents with copy-paste Python code (Claude API + GitHub API)
- Architect a Level 4 system — text-to-feature with managed state across multi-step pipelines
- Know where the frontier is — what Level 5 looks like and who's building it today
- Pick the right level for your team's needs with a clear decision framework
Prerequisites: Basic Python. An Anthropic API key (pip install anthropic). A GitHub account if you want to test Levels 1 and 4.
Time: ~25 minutes to read, ~2 hours if you build Levels 1–3.
Not All Coding Agents Are the Same
When people say "AI coding agent," they might mean anything from a script that posts a GitHub comment to a fully autonomous system that reads a feature spec and opens a pull request. The gap between those two things is enormous.
This guide maps out five levels of coding agents, from the simplest to the most capable. Each level builds on the last. You can stop at any level and ship something useful — or keep climbing.
Level 1: Code Review Agent
What it does: Reads a pull request diff and posts structured review comments via the GitHub API.
Why it's useful: Catches common issues (missing error handling, hardcoded values, obvious bugs) before a human reviewer even looks.
How to build it:
import anthropic
import requests
def review_pr(repo: str, pr_number: int, github_token: str):
# Fetch the diff
headers = {"Authorization": f"token {github_token}"}
diff_url = f"https://api.github.com/repos/{repo}/pulls/{pr_number}"
pr_data = requests.get(diff_url, headers=headers).json()
# Get the diff content
diff_response = requests.get(
pr_data["diff_url"],
headers={**headers, "Accept": "application/vnd.github.diff"}
)
diff = diff_response.text
# Ask Claude to review it
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=1024,
messages=[{
"role": "user",
"content": f"Review this PR diff and list any issues:\n\n{diff}"
}]
)
review_body = response.content[0].text
# Post the review
review_url = f"https://api.github.com/repos/{repo}/pulls/{pr_number}/reviews"
requests.post(review_url, headers=headers, json={
"body": review_body,
"event": "COMMENT"
})
Key insight: This is a single LLM call with no loop. The AI reads, thinks, and writes. That's all you need at Level 1.
Level 2: Code Testing Agent
What it does: Reads source code and generates test files (pytest for Python, Jest for TypeScript) automatically.
Why it's useful: Test coverage is the bottleneck on most teams. A testing agent removes the friction of writing boilerplate tests.
How to build it:
def generate_tests(source_file: str, test_framework: str = "pytest"):
with open(source_file) as f:
source_code = f.read()
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=4096,
messages=[{
"role": "user",
"content": f"""Generate comprehensive {test_framework} tests for this code.
Cover: happy paths, edge cases, error conditions, boundary values.
Return only the test file content, no explanation.
```python
{source_code}
```"""
}]
)
test_content = response.content[0].text
test_file = source_file.replace(".py", "_test.py")
with open(test_file, "w") as f:
f.write(test_content)
return test_file
Key insight: The model has deep knowledge of testing patterns. Your job is to give it the source and get out of the way.
Level 3: Iterative Looping Agent
What it does: Runs the test suite, reads the failures, generates a fix, applies it, runs tests again. Loops until tests pass or it hits a retry limit.
Why it's useful: This is where agents start to feel genuinely powerful. The feedback loop between running code and fixing it is what separates a one-shot generator from an actual agent.
How to build it:
import subprocess
def iterative_fix_agent(source_file: str, max_iterations: int = 5):
client = anthropic.Anthropic()
for iteration in range(max_iterations):
# Run the tests
result = subprocess.run(
["pytest", source_file, "--tb=short", "-q"],
capture_output=True, text=True
)
if result.returncode == 0:
print(f"✅ All tests passing after {iteration} iterations")
return True
# Tests failed — read the current code and failures
with open(source_file) as f:
current_code = f.read()
test_output = result.stdout + result.stderr
# Ask the model to fix it
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=4096,
messages=[{
"role": "user",
"content": f"""This code has failing tests. Fix the code.
Test failures:
{test_output}
Current code:
{current_code}
Return only the corrected code, no explanation."""
}]
)
fixed_code = response.content[0].text
# Strip markdown fences if present
if fixed_code.startswith("```"):
fixed_code = fixed_code.split("\n", 1)[1].rsplit("```", 1)[0]
with open(source_file, "w") as f:
f.write(fixed_code)
print(f"Iteration {iteration + 1}: applied fix, re-running tests...")
print("❌ Max iterations reached without passing tests")
return False
Key insight: The loop + feedback is what makes this an agent. Without the loop, it's just a code generator. With the loop, it can self-correct.
Level 4: Text-to-Feature Agent
What it does: Takes a plain-English feature spec, writes the implementation, generates tests, opens a pull request — fully autonomous from spec to PR.
Why it matters: This is the level that starts to look like magic to people who haven't seen it. You write "Add a /health endpoint that returns server uptime and version" and 60 seconds later there's a PR ready for review.
Architecture:
- Spec parser — Claude reads the spec and produces a structured implementation plan (files to create/modify, function signatures, test cases)
- Code generator — For each file in the plan, Claude writes the implementation
- Test generator — Level 2 agent runs on the new code
- Iterative fixer — Level 3 agent runs until tests pass
- PR creator — Git commits the changes, opens PR with auto-generated description
The key addition at Level 4: The agent manages its own context. It knows which files it's changed, what tests it's written, and what still needs to happen. This is typically done with a structured state object passed through each step.
state = {
"spec": original_spec,
"plan": parsed_plan,
"files_written": [],
"tests_written": [],
"tests_passing": False,
"pr_url": None
}
Each step reads and updates this state. The agent knows where it is in the pipeline at all times.
Level 5: The Future (Autonomous Engineering Agent)
What it does: Monitors the issue tracker, triages incoming bugs, implements fixes, communicates with stakeholders, and learns from the outcomes of its PRs.
What it looks like in practice:
- A new GitHub issue is opened: "Button click causes 500 error on mobile"
- The agent reads the issue, reproduces the error using browser automation, traces the stack, identifies the root cause, writes a fix, tests it, opens a PR, and comments on the issue with a plain-English explanation for the reporter
- If the PR reviewer requests changes, the agent reads the review, applies the requested changes, and re-requests review
What makes Level 5 different:
- Long-horizon planning: Tasks span hours or days, not seconds
- Stakeholder communication: The agent writes to humans (issue comments, PR descriptions, Slack messages) as part of its workflow
- Learning from feedback: PR reviews, merge/reject decisions, and production metrics feed back into the agent's future decisions
- Multi-agent coordination: Several specialized sub-agents (reproducer, debugger, implementer, reviewer) coordinate on a single task
Where we are today: Level 5 exists in research and early production at a few companies (Cognition's Devin, GitHub Copilot Workspace). Most teams are building at Level 3–4.
Choosing Your Level
| Level | Complexity | Autonomy | Best for | |---|---|---|---| | 1 | Low | None | Teams that want AI in their PR workflow today | | 2 | Low | None | Boosting test coverage without manual effort | | 3 | Medium | Partial | Fixing known test failures automatically | | 4 | High | High | Automating entire feature implementations | | 5 | Very high | Full | Research / frontier engineering teams |
Start at Level 1. Ship it. See what breaks. Move to Level 2 when you're ready.
Try These Now (3 Exercises)
Exercise 1 — Build Level 1 (30 min): Copy the code review agent above. Create a test repo with a deliberately buggy PR (missing error handling, hardcoded secrets, unused imports). Run the agent against it. Evaluate: did it catch the real issues? Did it hallucinate any?
Exercise 2 — Build Level 3 (45 min): Write a small Python function with a known bug (e.g., off-by-one error in a loop). Write a test that catches it. Then run the iterative fix agent from Level 3 and watch it self-correct. Experiment with max_iterations — what happens at 1 vs. 3 vs. 10?
Exercise 3 — Design Level 4 (30 min, no code): Write a plain-English feature spec: "Add a /health endpoint that returns {status: 'ok', uptime_seconds: N, version: '1.0.0'}." Then sketch the state object the Level 4 agent would use. What fields does it need? What's the sequence of steps? What happens if step 3 fails?
Key Takeaways
- Level 1 (review) and Level 2 (test generation) are single LLM calls — ship them in an afternoon
- Level 3 adds the loop: run → fail → fix → rerun. This is where agents become genuinely useful
- Level 4 adds multi-step planning and state management — the agent tracks where it is in a pipeline
- Level 5 is long-horizon autonomy with stakeholder communication — frontier territory, not for most teams yet
- The feedback loop is the key differentiator — without it, you have a generator, not an agent
What's Next
Recommended next courses:
- AI Agents 101 — Part 1: What Is an AI Agent? — the foundational mental model and loop pattern used in every level above
- AI Agents 101 — Part 2: Give Your Agent Real Tools — error handling, web search, and code execution tools
- MCP 101: Build MCP Servers — connect your coding agents to GitHub, databases, and any API via a standard protocol
If you want to build alongside other serious AI builders:
Get the free AI Builder Newsletter
Weekly deep-dives on AI tools, automation workflows, and builder strategies. Join 5,000+ readers.
No spam. Unsubscribe anytime.
Go deeper with AI Builder Club
Join 1,000+ ambitious professionals and builders learning to use AI at work.
- ✓Expert-led courses on Cursor, MCP, AI agents, and more
- ✓Weekly live workshops with industry builders
- ✓Private community for feedback, collaboration, and accountability