Free Course#gstack#claude-code#ai-agents#workflow#open-source#course

gstack: The Complete Course — Ship Like a Full Engineering Team, Solo

Garry Tan open-sourced his entire AI development workflow. gstack turns Claude Code into a virtual engineering team with 23 specialist roles — CEO reviewer, QA lead, security officer, and more. Here's who it's for, how it works, and why it might change how you ship software.

AI Builder ClubApril 7, 202611 min read

Garry Tan — president of Y Combinator, former early engineer at Palantir — claims he's shipping 10,000–20,000 lines of production code per day. Part-time. While running YC full-time.

His secret isn't working harder. It's that he built himself a virtual engineering team out of AI agents, open-sourced the whole thing, and called it gstack.

This course breaks down what gstack actually is, who should use it, how it works under the hood, and why it represents a genuinely different way to think about AI-assisted development.


Who Is gstack For (And Who It's Not For)

Let's get this out of the way first, because gstack isn't for everybody.

You should pay attention if you're:

A solo founder shipping a product alone. This is the sweet spot. You're writing the code, designing the UI, handling security, doing your own QA, writing the docs, and managing releases. You know you should be doing code review, running security audits, and testing in a real browser — but there's one of you. gstack gives you a CEO reviewer, an engineering reviewer, a design reviewer, a QA lead, a security auditor, and a release engineer. All AI. All slash commands. The cost of a full security audit is about $0.15 and takes 30 seconds.

A small team (2–5 people) that can't afford specialists. You have engineers, but you don't have a dedicated QA person. You don't have a security team. Nobody's doing design reviews. Your "release process" is git push and a prayer. gstack fills the gaps your team can't staff.

A developer already using AI coding tools who's tired of the blank prompt. You use Claude Code, or Codex, or Cursor, or Kiro. You know the tools are powerful. But you also know you're mostly using them as fancy autocomplete — type a prompt, get code, paste it in, move on. No review. No tests. No process. gstack gives you a structured workflow on top of whatever AI tool you're already using.

You probably don't need gstack if:

Your team already has full process coverage. If you have dedicated QA, a security team, design reviewers, and a release engineering process — gstack would be redundant. It solves the "we don't have those people" problem, not the "our existing people aren't good enough" problem.

You just want a code generator. If you're looking for something that writes code faster, gstack is the wrong tool. Its value is in the process around the code — the planning, reviewing, testing, and shipping. The code generation is the least interesting part.


The Problem gstack Solves

Here's something most people don't think about: shipping software is not the same thing as writing code.

The real workflow looks like this:

Think → Plan → Build → Review → Test → Ship → Reflect

That's seven steps. Most developers using AI skip straight to step three. They type "build me a login page," the AI writes it, they push it to prod. No one reviewed it. No one tested it in a browser. No one checked for SQL injection. No one asked "is this even the right thing to build?"

The blank prompt is the culprit. It doesn't tell you what you're skipping. It just sits there, waiting for instructions. And when you're coding alone at 2 AM, you're not going to voluntarily run a security audit on yourself.

This creates what Garry calls "process bankruptcy" — you have zero engineering process. Code goes from your brain to prod with nothing in between.

Big companies have the opposite problem: too much process. Meetings, Jira tickets, review queues, design committees. A one-line change takes a week to ship.

gstack sits in the gap. It gives you team-grade process that runs in seconds, not weeks.

The "90% Done" Trap

AI makes the first 90% of any feature trivially easy. Scaffold a project? Seconds. Write the basic logic? A minute. Get something that "works"? Five minutes.

The last 10% is where things break: edge cases, error handling, accessibility, mobile responsiveness, performance, security. That's where products actually fail in the real world.

The old thinking was: "the last 10% is expensive, so defer it." Ship fast, fix later.

gstack flips this. AI made the last 10% cheap too. Look at the actual numbers:

| Task | Traditional team | AI + gstack | |------|-----------------|-------------| | Boilerplate / scaffolding | 2 days | 15 min | | Writing tests | 1 day | 15 min | | Full feature implementation | 1 week | 30 min | | Bug fix + regression test | 4 hours | 15 min | | Architecture / design review | 2 days | 4 hours | | Security audit | $10,000+ | $0.15 |

When a security audit costs fifteen cents and a QA pass costs four dollars, there's no excuse for skipping them. The old trade-offs don't apply anymore.

gstack calls this principle Boil the Lake: if you can finish the whole thing in one session, finish the whole thing. Don't cut corners on something that's completable. Save the shortcuts for genuinely massive migrations that span quarters.


The One-Line Pitch

You become the CTO of a team of AI specialists. You direct; they execute. Each additional quality step — security audit, QA pass, design review — costs pennies and seconds instead of headcount and weeks.


How gstack Actually Works

gstack is a collection of 23 "skills" — each one is a Markdown file that turns your AI coding assistant into a specialist. You invoke them with slash commands.

The skills follow a sprint cycle. They're meant to be used in order, though you can skip steps or use them individually:

/office-hours → /autoplan → [build] → /qa → /review → /ship → /retro

Here's what each phase looks like in practice.

Phase 1: Think

/office-hours is where everything starts. It simulates a YC partner conversation. You describe what you want to build, and instead of jumping to code, it pushes back on your assumptions. It asks forcing questions: What's the actual problem? Who's the user? What would a 10-star version of this look like?

It outputs a design doc. No code. The whole point is to think before you build.

/plan-ceo-review then reads that design doc and challenges it from a product perspective. "Is this the right thing to build? Can we cut scope? What's the wedge?"

/plan-eng-review challenges from a technical angle. "What's the data model? What happens when this gets 10x traffic? What are the edge cases you haven't thought about?"

/plan-design-review rates your design on multiple dimensions (0–10 scale) and explains what a 10 looks like for each one.

Or just run /autoplan and it chains all the reviews together automatically.

Phase 2: Build

This is where you actually write code — with your AI assistant, however you normally do it. gstack doesn't change this step much. It adds design tools if you need them:

/design-shotgun generates 4–6 visual mockup variants for any UI, opens them side by side in your browser, and lets you pick favorites and iterate. It learns your taste over time.

/design-html takes an approved mockup and turns it into production-quality HTML/CSS that actually handles responsive layout correctly.

Phase 3: Verify

This is where gstack really earns its keep.

/qa is the flagship skill. It opens a real Chromium browser, navigates your app, finds bugs, fixes them in your code, and re-verifies the fix. It writes regression tests for every bug it finds. Three tiers: quick smoke test, standard (covers edge cases), and exhaustive (every interaction path, accessibility, responsive, performance).

/review does pre-merge code review. It reads your diff against the base branch and checks for SQL injection, prompt injection, scope drift, missing tests, and whether the code actually matches the plan.

/cso runs a security audit — OWASP Top 10 + STRIDE threat modeling. Each finding includes a concrete exploit scenario, not just a warning.

/investigate is for debugging. It has one hard rule: no fixes without finding the root cause first. This stops the AI from playing whack-a-mole with symptoms.

Phase 4: Ship

/ship is a single command that runs tests, does a code review pass, bumps the version, writes a changelog entry, commits, pushes, and opens a PR.

/land-and-deploy takes it further: merges the PR, waits for the deploy to finish, and verifies the deployment is healthy.

/canary monitors for post-deploy issues — console errors, performance regressions, page failures.

Phase 5: Reflect

/retro runs a weekly retrospective. It looks at your commit history, gives per-person breakdowns (useful for small teams), tracks shipping streaks, test health trends, and growth opportunities.

Safety Rails

AI agents with shell access can do real damage. gstack includes guardrails:

  • /careful warns before destructive commands like rm -rf, DROP TABLE, or force-pushes
  • /freeze restricts the AI to only edit files in one directory — so while you're debugging src/api/, it can't accidentally break src/auth/
  • /guard activates both at once

These exist because some mistakes don't have an undo button.


What's Actually Happening Under the Hood

Skills Are Just Markdown

Every gstack skill is a Markdown file. Not a Python script. Not a TypeScript module. Prose.

The AI reads a SKILL.md the way a senior engineer reads a runbook. Each file contains a name, step-by-step instructions, and decision points written in plain English ("If the project uses TypeScript, do X. Otherwise, do Y.").

This is a deliberate design choice. Markdown is:

  • Readable — you can audit exactly what the AI is being told to do
  • Forkable — don't like how a skill works? Edit the prose. No debugging required.
  • Universal — works on Claude Code, Codex, Cursor, Kiro, and four other hosts with zero code changes

Skills get generated from .tmpl template files using a placeholder system. Shared logic (like "how to detect the base branch" or "how to initialize the browser") gets injected automatically so skills stay DRY.

The Browser Is a Persistent Daemon

The /qa and /browse skills don't spin up a new browser for every command. gstack runs a persistent Chromium server that stays alive between commands. Each command (click, screenshot, navigate) takes about 100ms.

The clever part is the snapshot system. Instead of using fragile CSS selectors or XPaths, gstack converts a page into an accessibility tree and assigns refs like @e1, @e2, @e3 to interactive elements. The AI says "click @e5" instead of trying to target div.container > ul > li:nth-child(3) > a. Way more reliable.

It Gets Smarter Over Time

gstack has a learnings system. Every time the AI discovers something about your project — the test command, a deploy quirk, an architectural decision — it saves it as a JSON line in a local file. Next session, it searches past learnings so it doesn't ask you the same question twice.

Run /learn to see what gstack knows about your project, prune bad entries, or export the whole thing.

Works on 8 Different AI Agents

gstack isn't locked to Claude Code. It supports Claude Code, Codex, Cursor, Kiro, Factory, OpenCode, Slate, and OpenClaw. Same skill templates, different generated outputs per host. The skills encode the process; the generation layer adapts to each agent's API and file conventions.


The Deeper Idea: Process Over Code

Most people think of AI coding tools as code generators. Better autocomplete. Faster typing.

gstack represents a different thesis: AI is most valuable when it runs engineering process, not when it writes code.

Think about it. The code generation part — you already had that. Copilot, Claude, ChatGPT, they all write code. The thing nobody had was the process around the code: the review that catches the bug before it ships. The QA pass that finds the broken mobile layout. The security audit that spots the SQL injection. The retro that identifies why the last sprint went sideways.

A solo developer running the full gstack loop — /office-hours/autoplan → build → /qa/review/ship/retro — ends up with more process rigor than most 5-person teams. Not because they're more disciplined. Because each step costs almost nothing to run.

This changes the economics in a fundamental way. Quality used to require headcount. Now it requires a slash command.

Why Roles Beat Prompts

A single prompt that says "be a good engineer" produces mediocre results. It tries to do everything and excels at nothing.

gstack splits the work into specialized roles. A CEO reviewer and an eng reviewer will disagree — one wants to expand scope, the other wants to constrain it. That tension is the point. It surfaces trade-offs you wouldn't see if one generalist prompt tried to evaluate everything at once.

Each reviewer catches a different class of problems because they look at the work through a different lens:

  • The CEO reviewer asks: "Does anyone actually want this?"
  • The eng reviewer asks: "Will this break at scale?"
  • The QA lead asks: "Does this actually work in a browser?"
  • The security officer asks: "Can this be exploited?"

You, the human, make the final call. The AI recommends. You decide.

Auditability Matters

Every skill is open source prose. You can read exactly what the AI is being told to do. You can verify that /ship runs tests before committing. You can confirm that /careful intercepts force-pushes.

Compare that to proprietary AI coding products where the system prompt is hidden. When AI agents can run shell commands and modify your codebase, knowing what they've been instructed to do isn't a nice-to-have — it's a security requirement.


Extending gstack

gstack is MIT licensed and designed to be forked.

Writing your own skill is straightforward: create a .tmpl file with frontmatter (name, triggers) and step-by-step instructions in Markdown. Use plain English for logic instead of bash conditionals. Never hardcode branch names or framework commands — read them from project config.

Adding a new AI host is one TypeScript config file that defines path rewrites, tool rewrites, and skill filtering. Register it and run the generator.

The testing setup has three tiers: free unit tests (instant), LLM evals (~$0.15 per run), and full E2E tests (~$3.85 per run). Tests auto-select based on what files changed in your git diff.

The whole thing is dogfooded — gstack is built using gstack. Garry uses it on real products, feels friction, fixes the skill template, regenerates, and the change is live immediately.


Getting Started

Install takes 30 seconds. Open Claude Code and paste:

git clone --single-branch --depth 1 https://github.com/garrytan/gstack.git ~/.claude/skills/gstack && cd ~/.claude/skills/gstack && ./setup

Then try this sequence on whatever you're building:

  1. /office-hours — describe what you're building, let it push back on your assumptions
  2. /plan-ceo-review — challenge the scope
  3. Build the feature
  4. /review — catch bugs before they ship
  5. /qa — test it in a real browser
  6. /ship — tests, PR, changelog, done

You'll know within 20 minutes if this is for you.


The Bottom Line

gstack isn't a better prompt. It's a process. It takes the seven-step loop that real engineering teams follow — think, plan, build, review, test, ship, reflect — and makes each step cost almost nothing.

The result: one person ships with more rigor than most teams. Not because they're smarter or more disciplined, but because the marginal cost of quality dropped to near-zero.

That's the actual shift happening right now. Not "AI writes code faster." It's that the process around the code — the part that used to require hiring five people — is now accessible to anyone willing to type a slash command.

Fork gstack on GitHub. It's free, MIT licensed, and open source.

Get the free AI Builder Newsletter

Weekly deep-dives on AI tools, automation workflows, and builder strategies. Join 5,000+ readers.

No spam. Unsubscribe anytime.

Go deeper with AI Builder Club

Join 1,000+ ambitious professionals and builders learning to use AI at work.

  • Expert-led courses on Cursor, MCP, AI agents, and more
  • Weekly live workshops with industry builders
  • Private community for feedback, collaboration, and accountability