New Sonnet 4.6 and Mega-Test of Coding LLMs
And more AI Coding news: February 18, 2026.
Hey hey,
I spent ~3 days on a methodology to test LLMs with Laravel projects, on different use-cases, with evaluations. It's the biggest "showdown" of coding models I've ever done, the link below.
In other news, OpenClaw was acqui-hired, while many people release their Claude Code tricks and prompts. All of that below, enjoy!
For Paid Substack Subscribers
I Tested 6 AI Coding LLMs on 7 Laravel Projects (Evals)
17-minute video: my biggest AI models comparison ever, so far. I’ve prepared the Evals (tests) for many different use-cases that I will use now and in the future “showdowns”.
My YouTube videos
I Tested New Sonnet 4.6 vs Opus 4.6: Speed, Token Usage, Code Quality
Is the new Sonnet 4.6 enough cheaper/faster than Opus? And is the code good enough? I tested on 7 Laravel projects.
I Tested New GLM-5 vs Opus and Sonnet. Wow.
GLM-5 was released, and I hurried to try it out on a bigger task, comparing code quality.
I Tried New Minimax M2.5 (and realized smth about ALL frontier LLMs)
A new Minimax M2.5 model was released, and I tested it out.
Two BIG Problems with Skills in Claude Code (My Way of Using Them)
I saw one useful skill recommended online, but realized I cannot use it “as is”. So here’s my workflow.
Two AI Prompts to Quickly Understand Old Codebase
You inherited someone else’s project and don’t know where to start? These prompts may help you.
From AI Coding Community
Skill/Prompt: Technical Debt Manager for PHP/Laravel in Claude Code
I’ve taken a skill from Claude Code Templates and adapted it to PHP/Laravel projects.
Luca Dellanna: Claude Code Self-improvement injection with Hook
Claude Code can give you a hint on what you can do better next time.
Introducing Claude Code to Figma.
x.com
A tweet from official Figma team.
Peter Steinberger (OpenClaw author): “I’m joining OpenAI to bring agents to everyone.”
x.com
OpenClaw is becoming a foundation: open, independent, and just getting started.
My agent stole my (api) keys. : r/ClaudeAI
reddit.com
My Claude has no access to any .env files on my machine. Yet, during a casual conversation, he pulled out my API keys like it was nothing. When I...
My Ghostty setup for Claude Code with SAND Keybindings
x.com
Article on X by Daniel San.
New in Context7: ctx7 skills suggest
x.com
Scans package.json, requirements.txt. Detects your stack. Recommends matching skills from the registry.
Code Factory: How to setup your repo so your agent can auto write and review 100% of your code
x.com
Article on X by Ryan Carson.
The Highest Point of Leverage in Claude Code
youtube.com
Video by Ray Amjad.
Cursor: Long-running agents are now available for Ultra, Teams, and Enterprise plans.
x.com
With our new harness, agents can complete much larger tasks.
Anthropic: We’ve raised $30B in funding at a $380B post-money valuation.
x.com
This investment will help us deepen our research, continue to innovate in products, and ensure we have the resources to power our infrastructure expansion as we make Claude available everywhere our customers are.
I just published a skill for using TDD with Claude Code.
x.com
Before: dozens of shit tests, coupled to implementation. After: only the tests required, validating real behavior.
OpenAI: Introducing GPT-5.3-Codex-Spark
x.com
Our ultra-fast model purpose built for real-time coding. We’re rolling it out as a research preview for ChatGPT Pro users in the Codex app, Codex CLI, and IDE extension.
That’s a wrap for this week. Keep building with AI!
Povilas Korop
AICodingDaily.com


Seven Laravel projects across three days is serious methodology. Most 'AI coding' tests are three problems, declared winner.
I did two months of Claude Code vs Codex on real projects that needed shipping. Different approach to yours - not benchmarks but actual work with consequences for getting it wrong. The findings weren't what I expected heading in.
One thing consistent with what you're seeing: the gap between models on simple CRUD is basically nothing. The gap opens up on multi-file reasoning and debugging unfamiliar codebases where the model has to track state across files it didn't write.
My notes after two months: https://thoughts.jock.pl/p/claude-code-vs-codex-real-comparison-2026