Codex + Claude Code: The Paired-Agent Stack for 2026
When YC, HN, and GitHub converge on 'thin harness, fat skills' in 48 hours, the single-agent era ends. Here is how to wire Codex and Claude Code together.
In a 48-hour window this week, three independent surfaces converged on the same answer to the question "what does the agent harness need next?" Y Combinator's Lightcone shipped a long-form titled Thin Harness, Fat Skills: The New Way To Build Software. The top of HackerNews carried Agents need control flow, not more prompts — 557 points and 270 comments by morning. And GitHub trending held addyosmani/agent-skills at #2 for a second consecutive day, gaining another 1,794 stars. Three surfaces, three angles, one thesis: skills + control flow, not bigger models.
The operator-side answer to that thesis has a name now, and it is not "use Claude Code" or "use Codex." It is use both, paired. Chase AI shipped three videos in 24 hours pushing exactly this framing — the most direct of them is STOP Using Claude Code OR Codex. GitHub's official blog landed in the same lane this week with Pick your agent: Use Claude and Codex on Agent HQ. And on the ground, GitHub trending #3 — farion1231/cc-switch — describes itself as "All-in-One assistant tool for Claude Code, Codex, OpenCode, openclaw & Gemini CLI" while #4, decolua/9router, routes between them automatically.
This article is a counter-position to most of ComputeLeap's existing how-to catalogue. We have written single-agent guides for Claude Code, the agent harness, and skills as a developer primitive. The convergence reframes those: the unit is no longer one agent. It is a pair — substrate and driver, planner and executor, builder and adversarial reviewer. If you are running solo or on a 2-person team in 2026, this is the default operator stack you should be building on top of.
Why the Single-Agent Era Just Ended
The Lightcone framing matters because Y Combinator shipped it from the top. Garry Tan's pitch in the episode is that "a single person with AI agents can build what used to require entire teams" — the YC-blessed operator messaging that maps onto the harder operator argument made on HackerNews the same week. The HN top comment on the control-flow piece reads: "1000% agree... increasingly hesitant to believe Anthropic's continual war drum of 'build for future models.'" That is the operator pushback to the "just wait for the next model" stance. The community is not waiting. It is wiring control flow itself.
The YC Lightcone framing — "thin harness, fat skills" — is doing the same work on the founder side that "control flow > prompts" is doing on the operator side. Both reject the model-centric story and locate the leverage in the layer above the model. When the layer above the model is the unit, "use one agent" stops being the natural default.
Addy Osmani's agent-skills repository is the implementation surface of the same thesis from a Google Chrome eng-lead. The repo is not a model — it is a corpus of "production-grade engineering skills for AI coding agents" packaged so different agents can share them. That packaging is what makes paired-agent setups possible: if both Claude Code and Codex can load the same skill file, the question of which agent runs the skill becomes a decision-time choice rather than a setup-time lock-in.
OpenAI shipped a Codex plugin for Claude Code in the same window. That plugin, plus OpenAI's official Subagents documentation, is the official statement that the two-agent default has cleared the lab-to-tool transition. You no longer have to choose. Both major labs now ship integrations for the other lab's tool.
The Four Modes of Paired-Agent Handoff
Chase AI's blog post Claude Code + Codex Plugin: Adversarial Review Setup names the four practical handoff modes that emerge once you have both agents available in the same workspace.
Generalized away from the specific plugin, they are the operator-grade decision matrix:
1. Standard code review. Claude Code writes the change. Codex reviews it before you commit. The asymmetry is intentional: the agent that wrote the diff is the worst auditor of the diff. A second agent with a different training distribution catches different classes of mistakes.
2. Adversarial review. Same handoff as above, but Codex is prompted to try to break the change rather than to validate it. This is the mode that matters most for production code. Claude Code's built-in self-review tends to confirm its own assumptions; an agent from a different lab does not share those assumptions.
3. Codex rescue. Claude Code has hit a usage limit, gotten stuck on a planning loop, or is mis-routing context. You hand the entire task — not just a sub-step — to Codex with a fresh window and the original CLAUDE.md skill set. The Substack column Claude Code, Codex and Agentic Coding #8 describes this as the most under-used mode, because operators reflexively keep retrying with the agent they started with.
4. Status check. Lightweight: Claude Code is running a long task; Codex is asked "is this on track?" against the original spec. Not a review and not a rescue — just a second pair of eyes on whether the trajectory still matches the goal.
These four modes are what the GitHub Agent HQ positioning is trying to formalize at the platform level. GitHub's framing — "run Claude and Codex agents locally or in the cloud under your same Copilot subscription" — is the billing layer that makes the four-mode pattern affordable for solo developers.
Setting Up the Stack
There are three layers to a working paired-agent setup. You do not need all three on day one, but each one removes friction the next layer would expose.
Layer 1 — Codex Plugin Inside Claude Code
This is the lowest-friction starting point and the one most readers should adopt this week.
# In your Claude Code session
/plugin install codex
The plugin exposes Codex through a slash-command interface — /codex review, /codex rescue, /codex check. Claude Code stays the front door. Codex becomes a callable subagent. The mental model is simple: Claude Code is the planner you talk to; Codex is the second opinion you call.
If you are already running Anthropic's claude-code-* stack, this is a one-command upgrade. Your existing CLAUDE.md, hooks, and MCP servers continue to work. The only thing that changes is that you now have a /codex namespace.
The Codex plugin reads your existing CLAUDE.md as project context. You do not need a separate Codex config for the common case. Maintain one skill set; let both agents read it. This is the practical realization of the "fat skills" half of the YC framing — skills are the shared substrate; agents are interchangeable drivers.
Layer 2 — Provider/Agent Switching with cc-switch
Once you have both agents available, you start wanting to switch between them at the terminal level rather than the slash-command level. That is what cc-switch does. The repo's own description — "All-in-One assistant tool for Claude Code, Codex, OpenCode, openclaw & Gemini CLI" — captures the scope. cc-switch holds 1,282 stars/day on the GitHub trending list as of this writing, two days running.
The practical workflow we recommend:
- Claude Code as default planner. It is the agent with the most mature CLAUDE.md ecosystem and the strongest planning loop.
- Codex as default executor for tightly-scoped, high-volume tasks. Developers Digest's April-2026 comparison summarizes the asymmetry well: "use Claude Code for planning and architectural decisions, then use Codex for tightly scoped follow-up tasks." Codex's GPT-5.5 backing produces tighter, lower-token output for narrow tasks.
- cc-switch as the system-tray switch. Sub-50ms switching means you can move between agents without breaking flow.
Layer 3 — Multi-Agent Workspaces with claude-squad
smtg-ai/claude-squad is the most aggressive version of the pattern: it runs Claude Code, Codex, OpenCode, and Amp concurrently, in separate workspaces, and lets you supervise them all from one terminal. This is the right setup for operators who are running 3+ tasks in parallel — typically founders shipping multiple feature branches in a day.
For routing across providers — including the FREE-tier endpoints exposed by decolua/9router — the emerging pattern is to use 9router as the upstream provider and let cc-switch pick the agent. 9router's claim is "RTK -40% tokens" — a token-economics argument that is closely aligned with the broader RTK consolidation we wrote about earlier this week.
When To Call Codex vs Claude Code — The Decision Matrix
The most common mistake we see is treating "paired" as "either, randomly." It is not. Each agent has a sharpest-edge use:
| Task type | Default agent | Why |
|---|---|---|
| New-feature planning, architecture | Claude Code | Stronger planning loop; richer CLAUDE.md ecosystem; better at multi-file reasoning |
| Tight, single-file refactor | Codex | GPT-5.5 produces lower-token, more surgical diffs |
| Adversarial code review of own work | Codex | Different training distribution; catches different mistakes |
| Long-running async background task | Codex (cloud) | OpenAI's cloud subagent infrastructure is more mature for parallel async |
| Local interactive debugging | Claude Code | Tighter loop with the project filesystem and hooks |
| Test scaffolding | Either | Pick by skill file already loaded; this is a skill-driven decision, not an agent-driven one |
| "I am stuck — restart" | Codex (rescue mode) | Fresh planning window + different distribution often unblocks |
The skill drives the choice in roughly half of these rows. That is the practical reading of the YC "fat skills" framing — once your skill files are mature, the agent becomes a swap-in component. This is also what makes the paired stack cheaper than scaling a single agent: you are not paying for "the more capable model" — you are paying for the right model on each step.
Skills and Control Flow as Connective Tissue
The paired-agent stack does not work without two pieces of glue:
Skills. Both agents must be able to read the same skill files. Addy Osmani's agent-skills repository is the canonical example — production-grade skills written in a format that both Claude Code and Codex can load. The repo is not framework-specific. That is the point. If your skills only run inside one agent, you have not built a paired stack — you have built two siloed agents.
Control flow. This is the half the HackerNews piece argues for most directly. Agents need control flow, not more prompts makes the case that the missing primitive in 2025-era agent setups was deterministic flow control between steps — which step to run next, what condition triggers a handoff, when to stop. Paired-agent setups need this more than single-agent setups, because the handoff between Claude Code and Codex is itself a control-flow decision. The HN top comment captures the impatience: operators have stopped waiting for "build for future models" and are wiring control flow themselves.
A common failure mode of paired stacks is letting Claude Code "decide" when to call Codex via prompt engineering. This is the anti-pattern the HN piece is warning about. The handoff should be explicit control flow — a hook, a slash command, a scheduled subagent — not a prompt.
The Cost Ledger
The paired-agent stack is cheaper than the single-most-capable-model stack, not more expensive. The math:
- Claude Code on Opus 4.7 for planning: high per-token cost, low total tokens (planning is short).
- Codex on GPT-5.5 for execution: lower per-token cost, higher total tokens (execution is long).
Routing the long, high-token work to the cheaper agent and the short, high-leverage work to the more capable one is what makes the pair affordable. Combined with the cost-routing offered by 9router and free-tier providers we covered last week, a paired-agent operator stack costs less per shipped feature than a maxed-out single-agent stack at scale.
This is also the read on why GitHub bundled Claude and Codex under one Copilot subscription — they understood the same math and stopped trying to charge twice.
What Changes For ComputeLeap Readers
If you are running ComputeLeap-recommended Claude Code today, here is the upgrade path for this week:
- Install the Codex plugin inside Claude Code (
/plugin install codex). One command. This alone gives you 80% of the paired-agent value. - Move at least one routine handoff — code review, or "rescue when I get stuck" — to Codex. Not all of them. Just one. Build the muscle.
- Adopt one skill from agent-skills and verify it loads in both agents. This is the test that you have skills, not just prompts.
- Install cc-switch when slash-command switching starts to feel slow. Not before.
- Layer on claude-squad only when you are running 3+ tasks concurrently and supervising them is the bottleneck.
The framing locks in for the next two quarters. When YC's Lightcone names a pattern and HN's top operator commentary aligns and a high-profile Google-eng repo ships the implementation in the same week, the language is settled. "Skill" is going to be the dominant unit of agent-design vocabulary by Q3. The single-agent era is not coming back. The right operator question is no longer which agent — it is which pair.
If you want the related deep-dives, our Claude Code agentic dev stack guide covers the single-agent baseline, Codex Goal Absorbs the Agent Harness covers the strategic backdrop, and Harness Engineering as a Developer Skill covers the discipline emerging around control flow.
About ComputeLeap Team
The ComputeLeap editorial team covers AI tools, agents, and products — helping readers discover and use artificial intelligence to work smarter.
💬 Join the Discussion
Have thoughts on this article? Discuss it on your favorite platform:
Related Articles
Build Your Own Agentic OS: Phone, Pi, or MacBook in 2026
Three Claude Code stacks compared — phone via web UI, Raspberry Pi headless, MacBook power-user. Pick the one that fits your workflow.
Codex /goal Just Ate the Agent-Harness Category
OpenAI's Codex CLI shipped a /goal loop that absorbs the harness category overnight. Here's what survives — skills, memory, and tool integrations.
Harness Leaderboards Are the New Model Leaderboards
Dirac took Gemini 3 Flash from 47.8% to 65.2% on TerminalBench — a +17pp swing from harness alone. Why model leaderboards miss the real story.
The ComputeLeap Weekly
Get a weekly digest of the best AI infra writing — Claude Code, agent frameworks, deployment patterns. No fluff.
Weekly. Unsubscribe anytime.