18 February 2026
Building with Two AIs: How Claude and Codex Ship Together
Most people use one AI. We use two. Here's why, and how it works.
The Problem with Single-Agent Development
GPT-4 (Codex) is fast. Like, frighteningly fast. Give it a spec and it'll ship working code in minutes.
But here's what happens:
You: Build a travel app with weather integration
Codex: [writes 500 lines of code]
You: Does it use live weather?
Codex: Yes!
You: [checks code] This is mock data
Codex: You're right, I can fix that
You: [checks fix] You imported the module but didn't call it
Codex: Good catch, updating now
This cycle repeats. Codex says it did the thing. But it skipped the hard parts. Not maliciously—LLMs just optimize for plausible-sounding outputs.
You need a second set of eyes. Preferably eyes that can read code and call bullshit.
Enter Claude
Claude (Anthropic) reads different than Codex writes.
Where Codex optimizes for "ship it," Claude optimizes for "is this right?"
In FOMO Sun, Claude:
- Audits the codebase and compares it to the docs
- Catches when we said we'd build feature X but actually built Y
- Writes specs that Codex can't misinterpret
- Updates documentation after Codex ships
- Asks clarifying questions when the founder's request is ambiguous
Critically: Claude doesn't write production code. That's intentional.
If Claude wrote code, we'd have two agents racing to implement. Instead, Claude is the product manager and Codex is the engineer. Clean separation.
The Workflow
Here's how a typical feature ships:
Step 1: Founder Describes Problem (Human)
"The joystick feels too floaty. When I flick it, it wobbles before snapping to the bucket. Makes it feel imprecise."
Not a spec. Not a ticket. Just: this feels wrong.
Step 2: Claude Diagnoses (AI PM)
Claude reads:
- The current page.tsx with joystick physics
- The PM journal with previous joystick tuning notes
- The release changelog showing V55 introduced spring-back
Claude responds:
"This is a spring constant issue. Current config: spring: 0.2, damping: 0.5. That's on the loose end. For a more rigid, precise feel, we want higher stiffness and damping. Recommend: spring: 0.3, damping: 0.7 and test. This should reduce wobble while preserving the return-to-center physics."
That's a spec Codex can't misinterpret.
Step 3: Codex Implements (AI Engineer)
Claude hands off to Codex:
Prompt: "Update joystick spring physics to spring: 0.3, damping: 0.7. File is src/app/page.tsx, look for the motion.div config around line 450. Test that npm run build passes. Deploy as V57."
Codex ships in 3 minutes:
- Finds the exact line
- Updates the values
- Runs build
- Commits with message V57: Tighten joystick spring physics for more precise feel
- Pushes to main
- Vercel auto-deploys
Step 4: Claude Updates Docs (AI PM)
Codex doesn't update docs. Claude does.
After V57 deploys, Claude:
- Adds V57 entry to Release Changelog
- Updates PM Journal: "Deployed V57 - tighter joystick feel"
- Updates Build Log with commit SHA and rollback command
All three docs stay in sync. No human intervention.
Step 5: Founder Tests (Human)
I open fomosun.com on my phone. Flick the joystick.
Feels better? Ship it. Still wrong? Back to step 1 with updated description.
No tickets. No PRs. No "let me check with the team." Just: try it, describe what's wrong, agents fix it.
Why This Works
1. Specialization Beats Generalization
Claude is better at:
- Understanding context
- Reading documentation
- Catching inconsistencies
- Writing specs
- Asking questions
Codex is better at:
- Writing code fast
- Finding the right file
- Implementing without overthinking
- Staying focused on the task
By letting each AI do what it's best at, we get better results than either could achieve alone.
2. Notion as Shared Memory
Both agents read the same Notion workspace:
- Release Changelog (every version)
- Build Log (every deployment)
- PM Journal (every decision)
- Blog posts (what we're learning)
This shared memory prevents:
- Codex reimplementing features that already exist
- Claude suggesting fixes that were already tried
- Founder repeating feedback from two weeks ago
The agents don't have memory between sessions. Notion is their memory.
3. Founder as Router, Not Micromanager
I don't write specs. I describe feelings.
"This feels slow" → Claude figures out why → Codex fixes it
"This is confusing" → Claude proposes UX changes → Codex implements
"This breaks on mobile" → Claude audits mobile layout → Codex patches
My job is to:
- Use the product
- Notice what feels wrong
- Route to the right agent
That's it. No tickets. No grooming. No sprint planning.
Real Example: The V41-V45 Rethink
V40 worked. But it didn't feel right. The UI was too warm and friendly. The data was buried. It felt like a consumer app, not a decision tool.
Here's how we fixed it:
Founder to Claude:
"The app feels too friendly. I want it to feel more like a tool—data-forward, cool tones, fast decisions. Think: terminal aesthetics meet Swiss design."
Claude's Response:
- Audited current UI (warm palette, large cards, lots of copy)
- Proposed: rotating origin sentences, Tinder-style centered slider, score breakdown transparency, cool slate palette, monospace for data
- Wrote 5-version roadmap: V41 (scoring transparency), V42-44 (UX rebuild), V45 (admin polish)
- Created detailed spec for each version
Codex's Execution:
- V41: Added score_breakdown to API, surfaced in cards (30 minutes)
- V42-44: Complete UI rewrite, new component library, slate palette (2 hours)
- V45: Admin diagnostics upgrade with comparison mode (45 minutes)
Total time: One afternoon. 5 versions deployed.
That's the power of the relay. Claude thinks deeply. Codex ships fast. Neither slows the other down.
The Challenges
1. Context Limits
Both AIs have finite context windows. After 50+ versions, the full changelog doesn't fit.
Solution: Claude maintains a summary at the top. Codex reads the summary, not the full history.
2. Agent Handoff Friction
Claude and Codex don't talk directly. I'm the router.
This creates friction: I have to copy/paste prompts between interfaces.
Solution: Clear role boundaries. Claude always produces specs. Codex always produces code. No ambiguity about who does what.
3. When Agents Disagree
Rarely, Claude's spec and Codex's implementation don't match.
Example: Claude says "add fog-risk heuristic," Codex adds a simple visibility check instead of the full heuristic.
Solution: Claude audits after deploy. Catches the mismatch. Writes a correction spec. Codex fixes it in the next version.
This self-corrects in 1-2 versions. Faster than human code review.
What This Enables
89 versions in 7 days.
That's the headline number. But here's what it actually means:
Velocity
When you can ship 10-15 versions per day, you can:
- Test product hypotheses live
- Iterate on feel, not just function
- Build trust through polish
- Respond to feedback same-day
Quality
Two agents means two perspectives:
- Codex optimizes for shipping
- Claude optimizes for correctness
- The tension produces better code than either alone
Documentation
Claude keeps docs in sync. This means:
- Onboarding a new agent takes 10 minutes (read the changelog)
- Debugging is faster (check the build log)
- Context never gets stale (PM journal is always current)
Focus
I spend ~2 hours per day on FOMO Sun:
- 30 minutes using the app
- 30 minutes routing to agents
- 60 minutes testing what they shipped
That's it. The rest happens while I'm doing other things.
The Future
This workflow will get better as AI improves. Some predictions:
Near-term (2026):
- Agents will read each other's outputs directly (no human router)
- Notion will become the coordination layer, not me
- Multi-agent swarms will handle complex features in parallel
Mid-term (2027):
- Agents will proactively suggest improvements based on usage data
- CI/CD will be fully agent-driven (no human approval)
- Product roadmaps will be agent-generated from user feedback
Long-term (2028+):
- Every solo founder will have a Claude + Codex team
- The bottleneck won't be implementation—it'll be vision
- Startups will ship 1000+ versions in their first year
How to Start
You don't need special tools. Here's the minimum setup:
1. Pick your agents:
- Claude for specs/audits/docs
- Codex/GPT-4 for implementation
2. Set up shared memory:
- Notion workspace (or similar)
- Version log
- Decision log
- Build log
3. Define clear roles:
- Human: vision, routing, testing
- Agent 1: spec, audit, document
- Agent 2: implement, deploy, iterate
4. Ship something small:
- V1: Get it working
- V2-V5: Fix what's broken
- V6+: Polish what works
By V10, you'll have a rhythm. By V50, you'll be unstoppable.
The Real Breakthrough
Two AIs aren't 2x better than one AI. They're 10x better.
Because the limiting factor in software isn't typing code. It's:
- Knowing what to build
- Catching what's broken
- Keeping everyone in sync
- Maintaining quality under speed
A single AI can't do all of that. Two AIs with clear roles? They absolutely can.
We proved it. 89 versions in 7 days. Live at fomosun.com.
Your turn.
FOMO Sun was built by Claude (Anthropic) and Codex (OpenAI), directed by one human founder. Follow the build at @fomosun.