Codex CLI Guide (2026): How It Works, Costs, Models
Codex CLI decoded for builders who already use Claude Code: the auth fork that controls your billing, the 2026 model decision tree, how the engine works, and when Codex actually beats Claude Code.
Course outline · AI Coding Tools (1.4)
Codex CLI is OpenAI's terminal coding agent, written in Rust, that reads and runs code on your machine. If you already live in Claude Code or Cursor, the reason to care is simple: independent tests put Codex at roughly 4x fewer tokens per task and ahead on terminal-native work. The reason most people get it wrong is also simple. The first screen asks you to sign in, and that one choice quietly decides your bill, your models, and your rate limits.
This is the decode for builders who already use a coding agent. Not "what is an AI agent." How Codex actually works, what it really costs in June 2026, and when it beats the tool you already have.
What Codex CLI Actually Is
Codex CLI is a local agent. It runs in your terminal, reads your repo, edits files, runs commands, and reviews its own diffs. Same shape as Claude Code. The interesting part is underneath.
It is a single statically compiled Rust binary. That matters: you can run it with no Node.js at all from the prebuilt GitHub release, and it is fast. Safety is enforced at the OS kernel layer through a sandbox, not at the application layer. Claude Code governs behavior with two dozen lifecycle hooks you wire up yourself. Codex governs it with kernel-level isolation the model cannot talk its way around. Two different philosophies for the same problem: how do you let an agent run commands without letting it wreck your machine.
One honest caveat before you commit: OpenAI's own README calls Codex CLI "an experimental project under active development ... not yet stable, may contain bugs." The repo sits at roughly 92K stars and 7,400+ open issues as of June 2026. Fast-moving, rough edges included. Plan for breaking changes between versions.
If you want to understand why both tools converge on the same 5-layer shape underneath, that is exactly what our agent engineering work breaks down.
Install It in Two Minutes (and the One Mistake Everyone Makes)
The single most common install failure is the package name. There is an unscoped codex package on npm from 2012 that has nothing to do with OpenAI. It installs silently and does nothing.
Use the scoped name. You need Node.js 22 or later for the npm path.
# Correct - the scoped OpenAI package
npm install -g @openai/codex
# Verify
codex --version
On macOS and Linux this is the whole story. On Windows, native PowerShell works now, but WSL2 is still the smoother path. If you do not want Node at all, grab the prebuilt Rust binary from GitHub releases.
Then run codex in a project directory and you are in an interactive session.
Free AI Builder Newsletter
Weekly guides on AI tools & builder strategies.
The Auth Fork That Decides Everything
Here is the part the install tutorials skip. The first time you run Codex, it asks: sign in with ChatGPT, or paste an API key. These are not two doors to the same room. They are two different products.
| ChatGPT sign-in | API key | |
|---|---|---|
| Billing | Credit-based, tied to your ChatGPT plan | Pay-per-token on your API account |
| Models | Curated picker (GPT-5.5 default) | Any model your key can access |
| Cloud features | Code review, cloud tasks, Slack | CLI/SDK/IDE only, no cloud |
| Rate limits | 5-hour windows + weekly caps | Governed by API tier |
If you are on ChatGPT Plus, Pro, or Enterprise, sign-in is included and you spend credits. If you are running Codex in CI or want a specific model, the API key is the path, and you pay per token. Picking the wrong one is how people end up confused about why a model "disappeared" from their picker or why their bill looks nothing like the pricing page.
One trap worth naming: OpenAI removed GPT-5.2 variants from ChatGPT sign-in on June 12, 2026, and existing chats auto-migrated to GPT-5.5, the most expensive model. If you were not watching, your per-message cost jumped. Check your model.
The 2026 Model Decision Tree
Codex gives you a model picker, and the defaults are not always the cheapest for the job. As of June 2026, here is the real lineup and what each is for.
| Model | Input / 1M | Output / 1M | Use it for |
|---|---|---|---|
| gpt-5-codex | $1.25 | $10.00 | Coding, code review, goal mode. The coding default. |
| gpt-5.4 | $2.50 | $15.00 | General-purpose work with strong coding |
| gpt-5.5 | $5.00 | $30.00 | Complex reasoning, research, frontier tasks |
| gpt-5.4-mini | varies (cheapest) | varies | Fast, cheap subagent and routine work |
The move most people miss: gpt-5-codex has a 400K-token context window and costs a quarter of gpt-5.5 on input, yet it is purpose-built for code. For day-to-day building, route coding tasks to gpt-5-codex and only reach for gpt-5.5 when you genuinely need frontier reasoning. You can wire this with named profiles in ~/.codex/config.toml so different task types hit different models automatically.
One thing to verify yourself: context window varies by model and surface. The API exposes 400K on gpt-5-codex, but in practitioner testing Nate Herk clocked GPT-5.5 in Codex at roughly 256K. Check /model and your plan before assuming the big number.
How the Engine Works
Once you are in, a few capabilities are worth knowing because they are where Codex earns its keep.
Approval modes. You choose how much leash the agent gets before it edits or runs anything: suggest, auto-edit, or full-auto. Full-auto plus the kernel sandbox is the combination that makes unattended runs safe enough to walk away from.
exec scripting. codex exec runs Codex non-interactively. This is the real unlock for automation: drop it in a CI step or a shell script and Codex runs a defined task headless, no TUI. If your goal is a repeatable loop rather than a chat, this is the entry point.
Subagents. Codex can parallelize a complex task across subagents with isolated context, the same pattern Claude Code uses for fan-out work.
AGENTS.md. Instead of a tool-specific config, Codex reads AGENTS.md, an open convention that works across 8+ agent tools. Write your project conventions once, use them in any compatible agent. This is the cross-tool answer to Claude Code's CLAUDE.md.
MCP and web search. Codex connects to external tools over MCP and can search the web mid-task for current information.
What's New in Codex (Last 30 Days)
Codex ships fast. June alone moved the CLI from v0.139 to v0.141. What actually matters for builders:
- Goal mode graduated from experiment.
/goalis now GA across the CLI, IDE extension, and app. Give Codex a milestone and it works toward it for hours or days while you check in, steer, or pause. OpenAI's dev team shared a real tip: "start side chats to understand the work done so far without interrupting the main task." This is OpenAI's answer to the same goal-loop pattern Claude Code uses. /importmigrates you from Claude Code. v0.140 added/importto pull your setup, project config, and recent chats out of Claude Code. If you are switching or running both, you do not start from zero./usagetoken accounting. Daily, weekly, and cumulative token activity in one view. Useful, given how opaque the credit ledger is.- Encrypted remote executors. v0.141 moved remote execution onto end-to-end encrypted channels. This is the groundwork for running Codex against a remote box instead of your laptop.
- Web search in the CLI and rate-limit banking (bank unused resets, deploy capacity when you need it) round out the June additions.
Goal mode lives or dies by how you steer it. This walkthrough of the /goals loop - the tips and the mistakes to avoid - maps directly onto how you run Codex goal mode, since both are the same hands-off goal-loop pattern:
Practical Patterns Builders Are Using Right Now
From a last-30-days scan of Reddit, X, and YouTube, three patterns are getting the most traction.
Run it 24/7 on a server. The most-shared workflow this month: put Codex CLI on a cheap VPS so it keeps shipping while your laptop is closed. Tech With Tim walks through it. Combine it with codex exec and the new encrypted remote executors and you have an always-on agent.
Drive Codex from another agent. A composition trick builders are sharing: have Claude Code (or Codex itself) write a shell script that calls codex exec, so one agent runs the other and you switch effort level, model, or --last to resume (@infektyd). exec is the headless entry point that makes this work.
Stop picking one - orchestrate both. The loudest 30-day trend is not "Codex vs Claude Code," it is tooling to run them together. Fresh repos like Omnigent (1.8k stars, launched June 11) and golutra (3.7k stars) put Claude Code, Codex, and Gemini CLI on one control plane, while tools like Crossthreads and agentsview do cross-tool search and cost tracking. The consensus workflow: Codex for the cheap autonomous grind, Claude Code for the hard refactor, one dashboard for spend.
Here is what parallel, autonomous agent shipping actually looks like in practice - running OpenAI's orchestration setup to ship 5x more PRs:
Codex CLI vs Claude Code: The Honest Version
This is the question you actually came for. The short answer from independent 2026 testing: they optimize for different things, and the best builders run both.
The 30-day mood is a genuine comeback story. Nate Herk's 100-hour test (88K views) opens with it bluntly: people "basically forgot OpenAI existed, thanks to tools like Claude Code," then found Codex "actually better" over the past month. His receipts: Codex on GPT-5.5 high ran a 256K context window and finished three test runs in about 26 minutes, and he calls the $100 ChatGPT tier "one of the best values in AI coding agent market right now." Treat any single creator's benchmark as directional, but the sentiment shift is real and worth knowing before you commit.
| Benchmark | Claude Code | Codex CLI | Winner |
|---|---|---|---|
| SWE-bench Verified | ~80.9% | ~80% | Claude (marginal) |
| Terminal-Bench 2.0 | 65.4% | 77.3% | Codex |
| Blind code-quality review | 67% win | 25% win | Claude |
| Token efficiency | baseline | ~4x fewer | Codex |
| Raw speed | moderate | 240+ tok/s (Spark: 1000+) | Codex |
Two numbers drive most of the decision. Codex burns roughly 4x fewer tokens for the same task. In one Figma-to-code test that meant 1.5M tokens versus 6.2M. At API rates the same job can cost several times more on Claude Code. But in blind reviews, engineers rated Claude Code's output cleaner 67% of the time versus 25% for Codex.
So:
Reach for Codex when the work is terminal-native (scripts, DevOps, CI/CD), well-scoped, autonomous, or cost-sensitive. The token efficiency and kernel sandbox make it the better unattended grinder.
Reach for Claude Code when the change touches many files, the dependency graph matters, code quality is non-negotiable, or you need its deeper hook-based governance. It is the better thinking partner for complex features.
If that tradeoff sounds familiar, it is the same logic in our Claude Code vs Cursor breakdown: stop asking which tool is best, start asking which is best for the task in front of you. Many builders run Codex for the cheap autonomous grind and Claude Code for the high-stakes refactor.
The Real Takeaway
Codex CLI is not a Claude Code killer and Claude Code is not a Codex killer. They are two takes on the same machine: a model wired into a loop with tools, sandboxing, and a verifier. Once you can see that shape, every one of these tools reads the same way, and picking between them stops being a vibe and starts being a decision.
That is the whole point of learning how the engine works instead of just driving it. If you want to go all the way and build one yourself - the loop, the tools, the permissions, the sandbox - that is what we teach inside the agent engineering course.
Sources & Verification
Synthesized from OpenAI's official Codex docs (CLI, pricing, changelog v0.141.0) and a June 2026 third-party knowledge base tracking Codex auth, billing, and model changes. The 'what's new' and 'practical patterns' sections are drawn from a last-30-days scan of Reddit, X, YouTube, and the openai/codex GitHub repo (June 2026). Benchmark and community numbers are from independent third-party tests and creators cited below, not our own runs - treat them as directional. Model IDs and prices are as of June 2026 and change fast; verify against the official pricing page before you budget. See our editorial standards.
- Codex CLI (OpenAI Developers) - Official: install, models, approval modes, subagents, exec scripting, MCP, cloud tasks
- Codex Pricing (OpenAI Developers) - Official credit and API token pricing for GPT-5.5, GPT-5.4, GPT-5.4 mini
- Codex Changelog (OpenAI Developers) - Version history; v0.141.0 (June 2026) feature list
- Codex CLI's Two Worlds: Authentication Paths (Codex Knowledge Base) - How ChatGPT login vs API key changes billing, model access, and rate limits (June 2026)
- GPT-5-Codex June 14 Refresh and Model Selection (Codex Knowledge Base) - gpt-5-codex spec, 400K context, $1.25/$10 pricing, model decision tree
- Codex vs Claude Code CLI comparison (Particula) - Two-week head-to-head: Terminal-Bench 77.3% vs 65.4%, ~4x token efficiency
- 100 Hours Testing Claude Code vs ChatGPT Codex (Nate Herk, YouTube) - 88K-view practitioner test (May 2026): the 'Codex comeback' sentiment, 256K context on GPT-5.5 high, $100-tier value, ~26 min across 3 runs. One creator's results, directional.
- I Gave Codex a 24/7 Server (Tech With Tim, YouTube) - June 2026 walkthrough of running Codex CLI continuously on a VPS - the always-on autonomous pattern
- Goal mode graduated from experiment (@OpenAIDevs) - May 2026: /goal now GA across CLI, app, and IDE; runs for hours or days with check-in and pause
Join AI Builder Club
$37/mo
Get the free newsletter
Weekly deep-dives on AI tools, automation workflows, and builder strategies. Join 5,000+ readers.
No spam. Unsubscribe anytime.