Free Course#ai-agents#coding-agents#automation#python#github#course

The 5 Levels of AI Coding Agents Explained

The 5 levels of coding agents: Level 1 runs a linter, Level 5 ships features with no human in the loop. Real examples and tools at each stage.

AI Builder ClubApril 16, 2026Updated May 11, 20267 min read

Course outline · AI Coding Tools (1.2)

What You'll Learn

By the end of this course, you will:

Understand the 5 levels of coding agent autonomy — from single-call review bots to fully autonomous engineering agents
Build Level 1–3 agents with copy-paste Python code (Claude API + GitHub API)
Architect a Level 4 system — text-to-feature with managed state across multi-step pipelines
Know where the frontier is — what Level 5 looks like and who's building it today
Pick the right level for your team's needs with a clear decision framework

Prerequisites: Basic Python. An Anthropic API key (pip install anthropic). A GitHub account if you want to test Levels 1 and 4.

Time: ~25 minutes to read, ~2 hours if you build Levels 1–3.

Not All Coding Agents Are the Same

When people say "AI coding agent," they might mean anything from a script that posts a GitHub comment to a fully autonomous system that reads a feature spec and opens a pull request. The gap between those two things is enormous.

This guide maps out five levels of coding agents, from the simplest to the most capable. Each level builds on the last. You can stop at any level and ship something useful — or keep climbing.

For a preview of the upper levels in action — a coding agent that controls the browser and improves itself from what it sees — watch:

Level 1: Code Review Agent

What it does: Reads a pull request diff and posts structured review comments via the GitHub API.

Why it's useful: Catches common issues (missing error handling, hardcoded values, obvious bugs) before a human reviewer even looks.

How to build it:

python

import anthropic
import requests

def review_pr(repo: str, pr_number: int, github_token: str):
    # Fetch the diff
    headers = {"Authorization": f"token {github_token}"}
    diff_url = f"https://api.github.com/repos/{repo}/pulls/{pr_number}"
    pr_data = requests.get(diff_url, headers=headers).json()

    # Get the diff content
    diff_response = requests.get(
        pr_data["diff_url"],
        headers={**headers, "Accept": "application/vnd.github.diff"}
    )
    diff = diff_response.text

    # Ask Claude to review it
    client = anthropic.Anthropic()
    response = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=1024,
        messages=[{
            "role": "user",
            "content": f"Review this PR diff and list any issues:\n\n{diff}"
        }]
    )

    review_body = response.content[0].text

    # Post the review
    review_url = f"https://api.github.com/repos/{repo}/pulls/{pr_number}/reviews"
    requests.post(review_url, headers=headers, json={
        "body": review_body,
        "event": "COMMENT"
    })

Key insight: This is a single LLM call with no loop. The AI reads, thinks, and writes. That's all you need at Level 1.

Level 2: Code Testing Agent

What it does: Reads source code and generates test files (pytest for Python, Jest for TypeScript) automatically.

Why it's useful: Test coverage is the bottleneck on most teams. A testing agent removes the friction of writing boilerplate tests.

How to build it:

python

def generate_tests(source_file: str, test_framework: str = "pytest"):
    with open(source_file) as f:
        source_code = f.read()

    client = anthropic.Anthropic()
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=4096,
        messages=[{
            "role": "user",
            "content": f"""Generate comprehensive {test_framework} tests for this code.

Cover: happy paths, edge cases, error conditions, boundary values.
Return only the test file content, no explanation.

```python
{source_code}
```"""
        }]
    )

    test_content = response.content[0].text
    test_file = source_file.replace(".py", "_test.py")
    with open(test_file, "w") as f:
        f.write(test_content)

    return test_file

Key insight: The model has deep knowledge of testing patterns. Your job is to give it the source and get out of the way.

Level 3: Iterative Looping Agent

What it does: Runs the test suite, reads the failures, generates a fix, applies it, runs tests again. Loops until tests pass or it hits a retry limit.

Why it's useful: This is where agents start to feel genuinely powerful. The feedback loop between running code and fixing it is what separates a one-shot generator from an actual agent.

How to build it:

python

import subprocess

def iterative_fix_agent(source_file: str, max_iterations: int = 5):
    client = anthropic.Anthropic()

    for iteration in range(max_iterations):
        # Run the tests
        result = subprocess.run(
            ["pytest", source_file, "--tb=short", "-q"],
            capture_output=True, text=True
        )

        if result.returncode == 0:
            print(f"✅ All tests passing after {iteration} iterations")
            return True

        # Tests failed — read the current code and failures
        with open(source_file) as f:
            current_code = f.read()

        test_output = result.stdout + result.stderr

        # Ask the model to fix it
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=4096,
            messages=[{
                "role": "user",
                "content": f"""This code has failing tests. Fix the code.

Test failures:
{test_output}

Current code:
{current_code}

Return only the corrected code, no explanation."""
            }]
        )

        fixed_code = response.content[0].text
        # Strip markdown fences if present
        if fixed_code.startswith("```"):
            fixed_code = fixed_code.split("\n", 1)[1].rsplit("```", 1)[0]

        with open(source_file, "w") as f:
            f.write(fixed_code)

        print(f"Iteration {iteration + 1}: applied fix, re-running tests...")

    print("❌ Max iterations reached without passing tests")
    return False

Key insight: The loop + feedback is what makes this an agent. Without the loop, it's just a code generator. With the loop, it can self-correct.

Free AI Builder Newsletter

Weekly guides on AI tools & builder strategies.

Level 4: Text-to-Feature Agent

What it does: Takes a plain-English feature spec, writes the implementation, generates tests, opens a pull request — fully autonomous from spec to PR.

Why it matters: This is the level that starts to look like magic to people who haven't seen it. You write "Add a /health endpoint that returns server uptime and version" and 60 seconds later there's a PR ready for review.

Architecture:

Spec parser — Claude reads the spec and produces a structured implementation plan (files to create/modify, function signatures, test cases)
Code generator — For each file in the plan, Claude writes the implementation
Test generator — Level 2 agent runs on the new code
Iterative fixer — Level 3 agent runs until tests pass
PR creator — Git commits the changes, opens PR with auto-generated description

The key addition at Level 4: The agent manages its own context. It knows which files it's changed, what tests it's written, and what still needs to happen. This is typically done with a structured state object passed through each step.

python

state = {
    "spec": original_spec,
    "plan": parsed_plan,
    "files_written": [],
    "tests_written": [],
    "tests_passing": False,
    "pr_url": None
}

Each step reads and updates this state. The agent knows where it is in the pipeline at all times.

Level 5: The Future (Autonomous Engineering Agent)

What it does: Monitors the issue tracker, triages incoming bugs, implements fixes, communicates with stakeholders, and learns from the outcomes of its PRs.

What it looks like in practice:

A new GitHub issue is opened: "Button click causes 500 error on mobile"
The agent reads the issue, reproduces the error using browser automation, traces the stack, identifies the root cause, writes a fix, tests it, opens a PR, and comments on the issue with a plain-English explanation for the reporter
If the PR reviewer requests changes, the agent reads the review, applies the requested changes, and re-requests review

What makes Level 5 different:

Long-horizon planning: Tasks span hours or days, not seconds
Stakeholder communication: The agent writes to humans (issue comments, PR descriptions, Slack messages) as part of its workflow
Learning from feedback: PR reviews, merge/reject decisions, and production metrics feed back into the agent's future decisions
Multi-agent coordination: Several specialized sub-agents (reproducer, debugger, implementer, reviewer) coordinate on a single task

Where we are today: Level 5 exists in research and early production at a few companies (Cognition's Devin, GitHub Copilot Workspace). Most teams are building at Level 3–4.

Choosing Your Level

Level	Complexity	Autonomy	Best for
1	Low	None	Teams that want AI in their PR workflow today
2	Low	None	Boosting test coverage without manual effort
3	Medium	Partial	Fixing known test failures automatically
4	High	High	Automating entire feature implementations
5	Very high	Full	Research / frontier engineering teams

Start at Level 1. Ship it. See what breaks. Move to Level 2 when you're ready.

Try These Now (3 Exercises)

Exercise 1 — Build Level 1 (30 min): Copy the code review agent above. Create a test repo with a deliberately buggy PR (missing error handling, hardcoded secrets, unused imports). Run the agent against it. Evaluate: did it catch the real issues? Did it hallucinate any?

Exercise 2 — Build Level 3 (45 min): Write a small Python function with a known bug (e.g., off-by-one error in a loop). Write a test that catches it. Then run the iterative fix agent from Level 3 and watch it self-correct. Experiment with max_iterations — what happens at 1 vs. 3 vs. 10?

Exercise 3 — Design Level 4 (30 min, no code): Write a plain-English feature spec: "Add a /health endpoint that returns {status: 'ok', uptime_seconds: N, version: '1.0.0'}." Then sketch the state object the Level 4 agent would use. What fields does it need? What's the sequence of steps? What happens if step 3 fails?

Key Takeaways

Level 1 (review) and Level 2 (test generation) are single LLM calls — ship them in an afternoon
Level 3 adds the loop: run → fail → fix → rerun. This is where agents become genuinely useful
Level 4 adds multi-step planning and state management — the agent tracks where it is in a pipeline
Level 5 is long-horizon autonomy with stakeholder communication — frontier territory, not for most teams yet
The feedback loop is the key differentiator — without it, you have a generator, not an agent

What's Next

Recommended next reads:

AI Agents 101 — Part 1: What Is an AI Agent? — the foundational mental model and loop pattern used in every level above
AI Agents 101 — Part 2: Give Your Agent Real Tools — error handling, web search, and code execution tools
MCP 101: Build MCP Servers — connect your coding agents to GitHub, databases, and any API via a standard protocol

Go deeper with our courses:

AI Agent 101 — build and deploy research agents with tool use, API access, and deep web crawling
MCP 101 — build and deploy Model Context Protocols with fastMCP, auth, and Stripe

If you want to build alongside other serious AI builders:

👉 Join AI Builder Club

Sources & Verification

This guide is written from hands-on testing, then cross-checked against primary sources - official documentation and first-party announcements. Field results and opinions are labeled as such. See our editorial standards.

Join AI Builder Club

✓65+ lessons, 22+ workshops

✓350+ plug-and-play prompts & skills

✓Weekly live builder workshop

✓Premium tools (e.g. 10xCoder, AI tutor)

✓AI Builder Pack ($5,000+ in exclusive AI credits & perks)

1k+

Join 1,000+ builders already inside

Start shipping →30-day money-back · Cancel anytime

$37/mo

Live workshop

Get the free newsletter

Weekly deep-dives on AI tools, automation workflows, and builder strategies. Join 5,000+ readers.

No spam. Unsubscribe anytime.

Related Guides in This Series

How to Build an AI Agent from Scratch in Python (2026) — Build an AI agent from scratch in Python with no framework: the Anthropic SDK, a tool-use loop, and ~60 lines of code you fully control.
Claude Code Tutorial for Beginners: Ship in 30 Minutes — Claude Code tutorial for beginners: install, configure CLAUDE.md, run your first agent session, and ship a real feature in under 30 minutes.
Claude Code vs Cursor 2026: Which One for Your Workflow — After 3 months daily-driving both: Claude Code wins multi-file refactors, Cursor wins surgical edits. The decision framework and dual setup.
Claude Code Sub-Agents: 3x Output with Parallel Tasks — Run 3-8 parallel Claude Code agents for search, testing, and refactoring. Task tool patterns and real workflows that 3x throughput.
Multi-Agent System Python Tutorial (2026) — Build a multi-agent system in Python: a coordinator delegates to specialized workers and handles failures. Complete code, no framework lock-in.

Continue Learning

Mastering AI Agents

The builder's deep dive into agent loops, tools, context engineering & memory — from using AI to building it.

AI Agent 101

Build autonomous research agents with tool use, API access, web scraping, and deep search.

← Back to Blog