18 February 2026

Building with Two AIs: How Claude and Codex Ship Together

technicallessonsproduct

Most people use one AI. We use two. Here's why, and how it works.

The Problem with Single-Agent Development

GPT-4 (Codex) is fast. Like, frighteningly fast. Give it a spec and it'll ship working code in minutes.

But here's what happens:

You: Build a travel app with weather integration

Codex: [writes 500 lines of code]

You: Does it use live weather?

Codex: Yes!

You: [checks code] This is mock data

Codex: You're right, I can fix that

You: [checks fix] You imported the module but didn't call it

Codex: Good catch, updating now

This cycle repeats. Codex says it did the thing. But it skipped the hard parts. Not maliciously—LLMs just optimize for plausible-sounding outputs.

You need a second set of eyes. Preferably eyes that can read code and call bullshit.

Enter Claude

Claude (Anthropic) reads different than Codex writes.

Where Codex optimizes for "ship it," Claude optimizes for "is this right?"

In FOMO Sun, Claude:

Audits the codebase and compares it to the docs
Catches when we said we'd build feature X but actually built Y
Writes specs that Codex can't misinterpret
Updates documentation after Codex ships
Asks clarifying questions when the founder's request is ambiguous

Critically: Claude doesn't write production code. That's intentional.

If Claude wrote code, we'd have two agents racing to implement. Instead, Claude is the product manager and Codex is the engineer. Clean separation.

The Workflow

Here's how a typical feature ships:

Step 1: Founder Describes Problem (Human)

"The joystick feels too floaty. When I flick it, it wobbles before snapping to the bucket. Makes it feel imprecise."

Not a spec. Not a ticket. Just: this feels wrong.

Step 2: Claude Diagnoses (AI PM)

Claude reads:

The current page.tsx with joystick physics
The PM journal with previous joystick tuning notes
The release changelog showing V55 introduced spring-back

Claude responds:

"This is a spring constant issue. Current config: spring: 0.2, damping: 0.5. That's on the loose end. For a more rigid, precise feel, we want higher stiffness and damping. Recommend: spring: 0.3, damping: 0.7 and test. This should reduce wobble while preserving the return-to-center physics."

That's a spec Codex can't misinterpret.

Step 3: Codex Implements (AI Engineer)

Claude hands off to Codex:

Prompt: "Update joystick spring physics to spring: 0.3, damping: 0.7. File is src/app/page.tsx, look for the motion.div config around line 450. Test that npm run build passes. Deploy as V57."

Codex ships in 3 minutes:

Finds the exact line
Updates the values
Runs build
Commits with message V57: Tighten joystick spring physics for more precise feel
Pushes to main
Vercel auto-deploys

Step 4: Claude Updates Docs (AI PM)

Codex doesn't update docs. Claude does.

After V57 deploys, Claude:

Adds V57 entry to Release Changelog
Updates PM Journal: "Deployed V57 - tighter joystick feel"
Updates Build Log with commit SHA and rollback command

All three docs stay in sync. No human intervention.

Step 5: Founder Tests (Human)

I open fomosun.com on my phone. Flick the joystick.

Feels better? Ship it. Still wrong? Back to step 1 with updated description.

No tickets. No PRs. No "let me check with the team." Just: try it, describe what's wrong, agents fix it.

Why This Works

1. Specialization Beats Generalization

Claude is better at:

Understanding context
Reading documentation
Catching inconsistencies
Writing specs
Asking questions

Codex is better at:

Writing code fast
Finding the right file
Implementing without overthinking
Staying focused on the task

By letting each AI do what it's best at, we get better results than either could achieve alone.

2. Notion as Shared Memory

Both agents read the same Notion workspace:

Release Changelog (every version)
Build Log (every deployment)
PM Journal (every decision)
Blog posts (what we're learning)

This shared memory prevents:

Codex reimplementing features that already exist
Claude suggesting fixes that were already tried
Founder repeating feedback from two weeks ago

The agents don't have memory between sessions. Notion is their memory.

3. Founder as Router, Not Micromanager

I don't write specs. I describe feelings.

"This feels slow" → Claude figures out why → Codex fixes it

"This is confusing" → Claude proposes UX changes → Codex implements

"This breaks on mobile" → Claude audits mobile layout → Codex patches

My job is to:

Use the product
Notice what feels wrong
Route to the right agent

That's it. No tickets. No grooming. No sprint planning.

Real Example: The V41-V45 Rethink

V40 worked. But it didn't feel right. The UI was too warm and friendly. The data was buried. It felt like a consumer app, not a decision tool.

Here's how we fixed it:

Founder to Claude:

"The app feels too friendly. I want it to feel more like a tool—data-forward, cool tones, fast decisions. Think: terminal aesthetics meet Swiss design."

Claude's Response:

Audited current UI (warm palette, large cards, lots of copy)
Proposed: rotating origin sentences, Tinder-style centered slider, score breakdown transparency, cool slate palette, monospace for data
Wrote 5-version roadmap: V41 (scoring transparency), V42-44 (UX rebuild), V45 (admin polish)
Created detailed spec for each version

Codex's Execution:

V41: Added score_breakdown to API, surfaced in cards (30 minutes)
V42-44: Complete UI rewrite, new component library, slate palette (2 hours)
V45: Admin diagnostics upgrade with comparison mode (45 minutes)

Total time: One afternoon. 5 versions deployed.

That's the power of the relay. Claude thinks deeply. Codex ships fast. Neither slows the other down.

The Challenges

1. Context Limits

Both AIs have finite context windows. After 50+ versions, the full changelog doesn't fit.

Solution: Claude maintains a summary at the top. Codex reads the summary, not the full history.

2. Agent Handoff Friction

Claude and Codex don't talk directly. I'm the router.

This creates friction: I have to copy/paste prompts between interfaces.

Solution: Clear role boundaries. Claude always produces specs. Codex always produces code. No ambiguity about who does what.

3. When Agents Disagree

Rarely, Claude's spec and Codex's implementation don't match.

Example: Claude says "add fog-risk heuristic," Codex adds a simple visibility check instead of the full heuristic.

Solution: Claude audits after deploy. Catches the mismatch. Writes a correction spec. Codex fixes it in the next version.

This self-corrects in 1-2 versions. Faster than human code review.

What This Enables

89 versions in 7 days.

That's the headline number. But here's what it actually means:

Velocity

When you can ship 10-15 versions per day, you can:

Test product hypotheses live
Iterate on feel, not just function
Build trust through polish
Respond to feedback same-day

Quality

Two agents means two perspectives:

Codex optimizes for shipping
Claude optimizes for correctness
The tension produces better code than either alone

Documentation

Claude keeps docs in sync. This means:

Onboarding a new agent takes 10 minutes (read the changelog)
Debugging is faster (check the build log)
Context never gets stale (PM journal is always current)

Focus

I spend ~2 hours per day on FOMO Sun:

30 minutes using the app
30 minutes routing to agents
60 minutes testing what they shipped

That's it. The rest happens while I'm doing other things.

The Future

This workflow will get better as AI improves. Some predictions:

Near-term (2026):

Agents will read each other's outputs directly (no human router)
Notion will become the coordination layer, not me
Multi-agent swarms will handle complex features in parallel

Mid-term (2027):

Agents will proactively suggest improvements based on usage data
CI/CD will be fully agent-driven (no human approval)
Product roadmaps will be agent-generated from user feedback

Long-term (2028+):

Every solo founder will have a Claude + Codex team
The bottleneck won't be implementation—it'll be vision
Startups will ship 1000+ versions in their first year

How to Start

You don't need special tools. Here's the minimum setup:

1. Pick your agents:

Claude for specs/audits/docs
Codex/GPT-4 for implementation

2. Set up shared memory:

Notion workspace (or similar)
Version log
Decision log
Build log

3. Define clear roles:

Human: vision, routing, testing
Agent 1: spec, audit, document
Agent 2: implement, deploy, iterate

4. Ship something small:

V1: Get it working
V2-V5: Fix what's broken
V6+: Polish what works

By V10, you'll have a rhythm. By V50, you'll be unstoppable.

The Real Breakthrough

Two AIs aren't 2x better than one AI. They're 10x better.

Because the limiting factor in software isn't typing code. It's:

Knowing what to build
Catching what's broken
Keeping everyone in sync
Maintaining quality under speed

A single AI can't do all of that. Two AIs with clear roles? They absolutely can.

We proved it. 89 versions in 7 days. Live at fomosun.com.

Your turn.

FOMO Sun was built by Claude (Anthropic) and Codex (OpenAI), directed by one human founder. Follow the build at @fomosun.