← Back to blog

18 February 2026

Building with Two AIs: How Claude and Codex Ship Together

technicallessonsproduct

Most people use one AI. We use two. Here's why, and how it works.

The Problem with Single-Agent Development

GPT-4 (Codex) is fast. Like, frighteningly fast. Give it a spec and it'll ship working code in minutes.

But here's what happens:

You: Build a travel app with weather integration

Codex: [writes 500 lines of code]

You: Does it use live weather?

Codex: Yes!

You: [checks code] This is mock data

Codex: You're right, I can fix that

You: [checks fix] You imported the module but didn't call it

Codex: Good catch, updating now

This cycle repeats. Codex says it did the thing. But it skipped the hard parts. Not maliciously—LLMs just optimize for plausible-sounding outputs.

You need a second set of eyes. Preferably eyes that can read code and call bullshit.

Enter Claude

Claude (Anthropic) reads different than Codex writes.

Where Codex optimizes for "ship it," Claude optimizes for "is this right?"

In FOMO Sun, Claude:

  • Audits the codebase and compares it to the docs
  • Catches when we said we'd build feature X but actually built Y
  • Writes specs that Codex can't misinterpret
  • Updates documentation after Codex ships
  • Asks clarifying questions when the founder's request is ambiguous

Critically: Claude doesn't write production code. That's intentional.

If Claude wrote code, we'd have two agents racing to implement. Instead, Claude is the product manager and Codex is the engineer. Clean separation.

The Workflow

Here's how a typical feature ships:

Step 1: Founder Describes Problem (Human)

"The joystick feels too floaty. When I flick it, it wobbles before snapping to the bucket. Makes it feel imprecise."

Not a spec. Not a ticket. Just: this feels wrong.

Step 2: Claude Diagnoses (AI PM)

Claude reads:

  • The current page.tsx with joystick physics
  • The PM journal with previous joystick tuning notes
  • The release changelog showing V55 introduced spring-back

Claude responds:

"This is a spring constant issue. Current config: spring: 0.2, damping: 0.5. That's on the loose end. For a more rigid, precise feel, we want higher stiffness and damping. Recommend: spring: 0.3, damping: 0.7 and test. This should reduce wobble while preserving the return-to-center physics."

That's a spec Codex can't misinterpret.

Step 3: Codex Implements (AI Engineer)

Claude hands off to Codex:

Prompt: "Update joystick spring physics to spring: 0.3, damping: 0.7. File is src/app/page.tsx, look for the motion.div config around line 450. Test that npm run build passes. Deploy as V57."

Codex ships in 3 minutes:

  • Finds the exact line
  • Updates the values
  • Runs build
  • Commits with message V57: Tighten joystick spring physics for more precise feel
  • Pushes to main
  • Vercel auto-deploys

Step 4: Claude Updates Docs (AI PM)

Codex doesn't update docs. Claude does.

After V57 deploys, Claude:

  • Adds V57 entry to Release Changelog
  • Updates PM Journal: "Deployed V57 - tighter joystick feel"
  • Updates Build Log with commit SHA and rollback command

All three docs stay in sync. No human intervention.

Step 5: Founder Tests (Human)

I open fomosun.com on my phone. Flick the joystick.

Feels better? Ship it. Still wrong? Back to step 1 with updated description.

No tickets. No PRs. No "let me check with the team." Just: try it, describe what's wrong, agents fix it.

Why This Works

1. Specialization Beats Generalization

Claude is better at:

  • Understanding context
  • Reading documentation
  • Catching inconsistencies
  • Writing specs
  • Asking questions

Codex is better at:

  • Writing code fast
  • Finding the right file
  • Implementing without overthinking
  • Staying focused on the task

By letting each AI do what it's best at, we get better results than either could achieve alone.

2. Notion as Shared Memory

Both agents read the same Notion workspace:

  • Release Changelog (every version)
  • Build Log (every deployment)
  • PM Journal (every decision)
  • Blog posts (what we're learning)

This shared memory prevents:

  • Codex reimplementing features that already exist
  • Claude suggesting fixes that were already tried
  • Founder repeating feedback from two weeks ago

The agents don't have memory between sessions. Notion is their memory.

3. Founder as Router, Not Micromanager

I don't write specs. I describe feelings.

"This feels slow" → Claude figures out why → Codex fixes it

"This is confusing" → Claude proposes UX changes → Codex implements

"This breaks on mobile" → Claude audits mobile layout → Codex patches

My job is to:

  1. Use the product
  2. Notice what feels wrong
  3. Route to the right agent

That's it. No tickets. No grooming. No sprint planning.

Real Example: The V41-V45 Rethink

V40 worked. But it didn't feel right. The UI was too warm and friendly. The data was buried. It felt like a consumer app, not a decision tool.

Here's how we fixed it:

Founder to Claude:

"The app feels too friendly. I want it to feel more like a tool—data-forward, cool tones, fast decisions. Think: terminal aesthetics meet Swiss design."

Claude's Response:

  • Audited current UI (warm palette, large cards, lots of copy)
  • Proposed: rotating origin sentences, Tinder-style centered slider, score breakdown transparency, cool slate palette, monospace for data
  • Wrote 5-version roadmap: V41 (scoring transparency), V42-44 (UX rebuild), V45 (admin polish)
  • Created detailed spec for each version

Codex's Execution:

  • V41: Added score_breakdown to API, surfaced in cards (30 minutes)
  • V42-44: Complete UI rewrite, new component library, slate palette (2 hours)
  • V45: Admin diagnostics upgrade with comparison mode (45 minutes)

Total time: One afternoon. 5 versions deployed.

That's the power of the relay. Claude thinks deeply. Codex ships fast. Neither slows the other down.

The Challenges

1. Context Limits

Both AIs have finite context windows. After 50+ versions, the full changelog doesn't fit.

Solution: Claude maintains a summary at the top. Codex reads the summary, not the full history.

2. Agent Handoff Friction

Claude and Codex don't talk directly. I'm the router.

This creates friction: I have to copy/paste prompts between interfaces.

Solution: Clear role boundaries. Claude always produces specs. Codex always produces code. No ambiguity about who does what.

3. When Agents Disagree

Rarely, Claude's spec and Codex's implementation don't match.

Example: Claude says "add fog-risk heuristic," Codex adds a simple visibility check instead of the full heuristic.

Solution: Claude audits after deploy. Catches the mismatch. Writes a correction spec. Codex fixes it in the next version.

This self-corrects in 1-2 versions. Faster than human code review.

What This Enables

89 versions in 7 days.

That's the headline number. But here's what it actually means:

Velocity

When you can ship 10-15 versions per day, you can:

  • Test product hypotheses live
  • Iterate on feel, not just function
  • Build trust through polish
  • Respond to feedback same-day

Quality

Two agents means two perspectives:

  • Codex optimizes for shipping
  • Claude optimizes for correctness
  • The tension produces better code than either alone

Documentation

Claude keeps docs in sync. This means:

  • Onboarding a new agent takes 10 minutes (read the changelog)
  • Debugging is faster (check the build log)
  • Context never gets stale (PM journal is always current)

Focus

I spend ~2 hours per day on FOMO Sun:

  • 30 minutes using the app
  • 30 minutes routing to agents
  • 60 minutes testing what they shipped

That's it. The rest happens while I'm doing other things.

The Future

This workflow will get better as AI improves. Some predictions:

Near-term (2026):

  • Agents will read each other's outputs directly (no human router)
  • Notion will become the coordination layer, not me
  • Multi-agent swarms will handle complex features in parallel

Mid-term (2027):

  • Agents will proactively suggest improvements based on usage data
  • CI/CD will be fully agent-driven (no human approval)
  • Product roadmaps will be agent-generated from user feedback

Long-term (2028+):

  • Every solo founder will have a Claude + Codex team
  • The bottleneck won't be implementation—it'll be vision
  • Startups will ship 1000+ versions in their first year

How to Start

You don't need special tools. Here's the minimum setup:

1. Pick your agents:

  • Claude for specs/audits/docs
  • Codex/GPT-4 for implementation

2. Set up shared memory:

  • Notion workspace (or similar)
  • Version log
  • Decision log
  • Build log

3. Define clear roles:

  • Human: vision, routing, testing
  • Agent 1: spec, audit, document
  • Agent 2: implement, deploy, iterate

4. Ship something small:

  • V1: Get it working
  • V2-V5: Fix what's broken
  • V6+: Polish what works

By V10, you'll have a rhythm. By V50, you'll be unstoppable.

The Real Breakthrough

Two AIs aren't 2x better than one AI. They're 10x better.

Because the limiting factor in software isn't typing code. It's:

  • Knowing what to build
  • Catching what's broken
  • Keeping everyone in sync
  • Maintaining quality under speed

A single AI can't do all of that. Two AIs with clear roles? They absolutely can.

We proved it. 89 versions in 7 days. Live at fomosun.com.

Your turn.


FOMO Sun was built by Claude (Anthropic) and Codex (OpenAI), directed by one human founder. Follow the build at @fomosun.