Crabbox: Cloud Sandboxes for Parallel Coding Agents
OpenClaw creator Peter Steinberger's new project Crabbox gives each parallel coding agent an isolated cloud box to test in. Why the merge bottleneck is the real problem, how Crabbox works, and the codebase harness that productizes it.
Crabbox: Isolated Cloud Sandboxes for Parallel Coding Agents
When you run ten or fifteen coding agents in parallel, writing code stops being the bottleneck. Merging it becomes the bottleneck. Every agent produces a pull request, and every PR carries the risk of breaking the system, so the real work shifts from "make the agent write code" to "let the agent prove its code works before a human ever looks at it."
That is the problem Crabbox - the new project from OpenClaw creator Peter Steinberger - is built to solve. Here is the full walkthrough, why parallel testing breaks on your laptop, and the open-source codebase harness that productizes the fix.
The merge bottleneck is the new bottleneck
Once your team sets up loops - agents that get triggered on their own, pick up work, and ship it - you end up with a lot of sessions running at once. At almost any given time I have at least ten agent sessions going; Peter has posted screenshots of fifteen-plus in parallel. The result is a massive volume of PRs that simply was not possible before.
That volume is the problem. Each PR has to be reviewed, and each one might break production. So the constraint moves up the stack:
OLD bottleneck NEW bottleneck
┌────────────────┐ ┌────────────────────┐
│ writing code │ ───► │ getting code MERGED │
│ (agents solved │ │ verifying + trusting│
│ this) │ │ 15 parallel PRs │
└────────────────┘ └────────────────────┘
The fix is not "review faster." It is to give each agent the tools to verify its own work and attach evidence - a passing test, a screenshot, a screen recording - to the PR, so a human is reviewing proof, not guessing. That is the first job of a codebase harness: make the repo agent-ready so agents can run, test, and verify before they hand work back.
Why parallel testing breaks on your laptop
Giving one agent a browser tool and a dev server works fine. The setup falls apart when many agents run at once, because they share your machine:
- One database, one schema. If a local Supabase or Postgres instance is shared, one agent trying a new schema migration can break every other session at once.
- Hardcoded ports. Most repos are not set up to run multiple instances side by side - a port is pinned for a good reason, and two dev servers collide.
- One Docker daemon, one OS. Parallel sessions share the same daemon and resources, so they step on each other.
- Resource limits. A modern production repo is heavy. Running many full instances locally is slow and often impossible.
Git work trees solve the code isolation problem - each agent gets its own checkout to modify - but they do not solve the runtime problem. Agents still need a live dev server and database to actually test against, and that is what collides.
The fix: one isolated sandbox per work tree
The right architecture is to stop running every session on your local machine and instead give each agent its own isolated cloud environment - its own box, its own database, its own dev server - that cannot affect any other session.
LOCAL (work trees: code isolation only)
agent A ─┐ agent B ─┐ agent C ─┐
│ │ │
v v v
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ CLOUD BOX A │ │ CLOUD BOX B │ │ CLOUD BOX C │ runtime isolation
│ own DB │ │ own DB │ │ own DB │
│ own server │ │ own server │ │ own server │
└─────────────┘ └─────────────┘ └─────────────┘
test test test
(no shared schema, ports, or daemon - they can't break each other)
Building that pipeline by hand is real work: spin up a machine on spot, mount a disk, copy the code in, install dependencies, start the app, open a browser to test, then tear it all down afterward. My team built exactly this, and it unlocked a lot - but one rough edge remained: when an agent finds a bug mid-test, getting the local fix back into the cloud box is awkward. The normal commit-push-CI flow floods the repo with throwaway commits, and you do not want to rebuild the box every time.
What Crabbox does
Crabbox closes that last gap. It lets an agent warm up a cloud box, sync the uncommitted diff straight from the local work tree, and run tests against it in real time - so you can change a file locally and retest in seconds.
The workflow an agent runs:
- Warm up a box - spin up an isolated cloud sandbox on demand.
- Run any command in the cloud - the agent runs bash commands as if local, but they execute in the box. Each run first syncs the dirty diff from your local work tree (no commit needed - the folder just has to be git-initialized), then runs the command against the latest code.
- Generate evidence - collect screenshots, record video of the run, and publish artifacts to storage (such as an S3 bucket) so they can be posted inline as PR comments.
- Stop the box - tear it down and delete it when the task is done.
agent finds a bug ─► fixes file LOCALLY ─► next `run` auto-syncs the diff
▲ │
└──────────────── retest in the cloud ◄────────┘
no extra commits, no box rebuild - always testing the latest version
The whole thing is configured with three files:
- A Dockerfile that encapsulates everything your local machine has - Node, Docker, the Supabase CLI, any tool the app needs.
- A
crabbox.ymlthat defines the sandbox provider (the video uses Daytona for fast startup via prebuilt snapshots), the default work root, which folders to exclude from sync (heavy or unneeded ones - node_modules and build dirs are usually gitignored already), and environment variables to pass. Those vars are pushed to the box over an encrypted SSH connection, so the data plane stays relatively safe. - A
setup.shso the agent runs one script to install dependencies and bring the dev server up, instead of stepping through it each time.
Exact command names and flags live in the project and the video above - treat the steps here as the shape of the workflow, not copy-paste syntax.
The codebase harness that productizes this
You do not have to wire all of this together from scratch. We packaged the setup my team runs into an open-source Claude Code plugin: the AI Builder Club skills repo. It ships two flagship skill sets:
- Codebase harness - make any repo agent-ready so agents can run, test, verify, and ship, including an isolated cloud box per agent so loops can ship code in parallel.
- Loops - spin up compounding agent loops on a shared, file-based knowledge base.
Install it in Claude Code:
/plugin marketplace add AI-Builder-Club/skills
/plugin install skills@ai-builder-club
Then the two entry points:
/setup-codebase-harness- run it in the repo your agents work in, so they can run, test, and verify their own work (this is the harness Crabbox plugs into)./new-loop- run it where your agent's memory should live; it bootstraps the shared knowledge base and scaffolds your first compounding loop.
If you want the full walkthrough of the concept - the four ingredients of a loop and how the harness fits - the loop engineering guide is the written companion, and AI Agent Reliability and Cost Control covers why agent-generated evidence is what makes parallel work safe to merge.
Go deeper:
- AI Agent 101 Course - build agents that run, test, and verify their own work, then deploy them
- Claude Code 101 Course - CLAUDE.md, hooks, subagents, and work trees, the foundation a codebase harness sits on
Want the step-by-step build alongside a team running this in production? Join AI Builder Club.
Free AI Builder Newsletter
Weekly guides on AI tools & builder strategies.
Frequently Asked Questions
What is Crabbox?
Crabbox is a project from OpenClaw creator Peter Steinberger that gives coding agents an isolated cloud sandbox to test in. An agent warms up a box, syncs the uncommitted diff from its local work tree, runs commands and tests in the cloud in real time, generates artifacts like screenshots and video, then tears the box down. It targets teams running many agents in parallel.
Why do parallel coding agents break on a local machine?
Because they share your machine. A shared database and schema, hardcoded ports, a single Docker daemon, and finite resources mean one agent's migration or dev server can break every other session. Git work trees isolate the code but not the runtime, which is what collides.
How is Crabbox different from a git work tree?
A work tree gives each agent an isolated copy of the code to modify. Crabbox gives each agent an isolated runtime - its own cloud box, database, and dev server - so agents can actually run and test in parallel without affecting each other. They solve different halves of the problem and are used together.
What is a codebase harness?
A codebase harness is the setup that makes a repo agent-ready: it gives agents the tools and skills to run, test, and verify their own work and produce evidence (passing tests, screenshots, recordings) to attach to a PR. The AI Builder Club skills repo provides one via the /setup-codebase-harness command.
Do I need Crabbox to run parallel agents?
No, but you need something that solves runtime isolation. You can build the spin-up, sync, test, and teardown pipeline yourself, or use Crabbox plus a codebase harness to get it without the plumbing. The point is each agent needs its own isolated environment to test in, however you provide it.
What sandbox provider does it use?
The walkthrough uses Daytona because its sandboxes start quickly from prebuilt snapshots, but the provider is configured in crabbox.yml, so it is not the only option. You define the provider, the snapshot/image, sync rules, and the environment variables passed to the box.
Related Content
- OpenClaw: What It Is and How to Use It - The prior project from Crabbox's creator, Peter Steinberger.
- Loop Engineering: Generators, Verifiers, and Stop Conditions - The concept the codebase harness serves: agents that ship and verify on their own.
- AI Agent Reliability and Cost Control - Why agent-produced evidence is what makes parallel PRs safe to merge.
- Dynamic Workflows: Orchestrate Subagents at Scale - Running many agents from one script once the harness is in place.
- Harness: The 6 Components - Where run/test/verify fits in the larger harness picture.
Start Here
Pick the repo your agents already work in and run /setup-codebase-harness from the AI Builder Club skills repo. Get one agent running its own tests and attaching evidence to a PR before you scale to ten. The merge bottleneck only gets worse with volume, so solve verification first.
For the full build alongside a team running this in production, join the AI Builder Club.
Sources & Verification
Based on AI Jason's June 2026 walkthrough video (embedded below) and the AI Builder Club open-source skills repo, which productize the codebase-harness setup his team runs in production. Crabbox is Peter Steinberger's project; verify exact command names and flags against the project and the video, as both move fast. See our editorial standards.
- OpenClaw Creator's new secret project (AI Jason, YouTube) - The walkthrough this article is based on - Crabbox, the merge bottleneck, and the codebase-harness setup
- AI Builder Club - Skills (GitHub) - Open-source Claude Code plugin: /setup-codebase-harness and /new-loop, the productized version of the setup
- OpenClaw (GitHub repository) - Peter Steinberger's prior project, for context on the creator
Join AI Builder Club
$37/mo
Get the free newsletter
Weekly deep-dives on AI tools, automation workflows, and builder strategies. Join 5,000+ readers.
No spam. Unsubscribe anytime.