New Sonnet 4.6 and Mega-Test of Coding LLMs

And more AI Coding news: February 18, 2026.

Feb 18, 2026

Hey hey,
I spent ~3 days on a methodology to test LLMs with Laravel projects, on different use-cases, with evaluations. It's the biggest "showdown" of coding models I've ever done, the link below.

In other news, OpenClaw was acqui-hired, while many people release their Claude Code tricks and prompts. All of that below, enjoy!

For Paid Substack Subscribers

I Tested 6 AI Coding LLMs on 7 Laravel Projects (Evals)

17-minute video: my biggest AI models comparison ever, so far. I’ve prepared the Evals (tests) for many different use-cases that I will use now and in the future “showdowns”.

My YouTube videos

I Tested New Sonnet 4.6 vs Opus 4.6: Speed, Token Usage, Code Quality

Is the new Sonnet 4.6 enough cheaper/faster than Opus? And is the code good enough? I tested on 7 Laravel projects.

I Tested New GLM-5 vs Opus and Sonnet. Wow.

GLM-5 was released, and I hurried to try it out on a bigger task, comparing code quality.

I Tried New Minimax M2.5 (and realized smth about ALL frontier LLMs)

A new Minimax M2.5 model was released, and I tested it out.

Two BIG Problems with Skills in Claude Code (My Way of Using Them)

I saw one useful skill recommended online, but realized I cannot use it “as is”. So here’s my workflow.

Two AI Prompts to Quickly Understand Old Codebase

You inherited someone else’s project and don’t know where to start? These prompts may help you.

From AI Coding Community

Skill/Prompt: Technical Debt Manager for PHP/Laravel in Claude Code

I’ve taken a skill from Claude Code Templates and adapted it to PHP/Laravel projects.

Luca Dellanna: Claude Code Self-improvement injection with Hook

Claude Code can give you a hint on what you can do better next time.

Introducing Claude Code to Figma.
x.com

A tweet from official Figma team.

Peter Steinberger (OpenClaw author): “I’m joining OpenAI to bring agents to everyone.”
x.com

OpenClaw is becoming a foundation: open, independent, and just getting started.

My agent stole my (api) keys. : r/ClaudeAI
reddit.com

My Claude has no access to any .env files on my machine. Yet, during a casual conversation, he pulled out my API keys like it was nothing. When I...

My Ghostty setup for Claude Code with SAND Keybindings
x.com

Article on X by Daniel San.

New in Context7: ctx7 skills suggest
x.com

Scans package.json, requirements.txt. Detects your stack. Recommends matching skills from the registry.

Code Factory: How to setup your repo so your agent can auto write and review 100% of your code
x.com

Article on X by Ryan Carson.

The Highest Point of Leverage in Claude Code
youtube.com

Video by Ray Amjad.

Cursor: Long-running agents are now available for Ultra, Teams, and Enterprise plans.
x.com

With our new harness, agents can complete much larger tasks.

Anthropic: We’ve raised $30B in funding at a $380B post-money valuation.
x.com

This investment will help us deepen our research, continue to innovate in products, and ensure we have the resources to power our infrastructure expansion as we make Claude available everywhere our customers are.

I just published a skill for using TDD with Claude Code.
x.com

Before: dozens of shit tests, coupled to implementation. After: only the tests required, validating real behavior.

OpenAI: Introducing GPT-5.3-Codex-Spark
x.com

Our ultra-fast model purpose built for real-time coding. We’re rolling it out as a research preview for ChatGPT Pro users in the Codex app, Codex CLI, and IDE extension.

That’s a wrap for this week. Keep building with AI!

Povilas Korop
AICodingDaily.com

Pawel Jozefiak

Feb 18

Seven Laravel projects across three days is serious methodology. Most 'AI coding' tests are three problems, declared winner.

I did two months of Claude Code vs Codex on real projects that needed shipping. Different approach to yours - not benchmarks but actual work with consequences for getting it wrong. The findings weren't what I expected heading in.

One thing consistent with what you're seeing: the gap between models on simple CRUD is basically nothing. The gap opens up on multi-file reasoning and debugging unfamiliar codebases where the model has to track state across files it didn't write.

My notes after two months: https://thoughts.jock.pl/p/claude-code-vs-codex-real-comparison-2026

AI Coding Daily by Povilas Korop

Discussion about this post

Ready for more?