Discussion about this post

User's avatar
Pawel Jozefiak's avatar

Seven Laravel projects across three days is serious methodology. Most 'AI coding' tests are three problems, declared winner.

I did two months of Claude Code vs Codex on real projects that needed shipping. Different approach to yours - not benchmarks but actual work with consequences for getting it wrong. The findings weren't what I expected heading in.

One thing consistent with what you're seeing: the gap between models on simple CRUD is basically nothing. The gap opens up on multi-file reasoning and debugging unfamiliar codebases where the model has to track state across files it didn't write.

My notes after two months: https://thoughts.jock.pl/p/claude-code-vs-codex-real-comparison-2026

No posts

Ready for more?