Short answer:

“No. I believe all the primary/foundation models are pretty similar in their ‘intelligence’ level.”

Explanation:

Models I use: GPT 5.2, Sonnet 4.5, and Gemini 3 Pro and Flash in my “AgentAutoFlow” framework that runs inside of Roo Code in VS Code for software dev.

Every day or two, I switch the model each of my custom agent/mode uses for various reasons. About once a week or two, I try using some other model. But for now, those are the best. Had hope for Grok 4.1, which is quite “smart” but what disqualifies it for me is that it sucks at following directions, so I only use it for certain kinds of research.

What is “best”? I want intelligence, instruction following, and economy. Speed is a distant 4th in my priorities list.

Over the course of 2025 I learned a few things that made a *huge* difference, which led to finding that all of those models I mentioned earlier do very very well for my purposes, and… GPT 5.2 turns out to run a bit cheaper than those other two. I had to admit it because I think of Sam Altman as an opportunist scum. Unfortunately, GPT 5.2 “only” has a 400K context window, so for some tasks, like for my architect/planner agent/mode, I usually use Sonnet 4.5 because of its 1MB context window. Tip: Don’t let any model get filled past 60% or so of its overall context window size! Look up “context window compression”, which isn’t the only way to keep context windows small.

Basically, all current LLMs’ performance starts degrading after context windows get somewhere in that 60% full range.

Anyway, some of the things I learned mostly involve the custom instructions I gave each agent/mode (think architect, orchestrator, coder, coder jr., debugger, tester, docs writer, etc.:
– Keep a “source of knowledge about the app” file constantly updated and make sure it is named, located, and referred to in a way that your agents/modes see it.
– Strict guidelines about the entire app, pieces of the app, naming conventions, etc.
– Strict instructions and guidelines for each agent/mode.
– Do what you can to keep context window content small as possible, including – like I do – have agent/modes call each other, clearing context and only passing “plan file” and “log file” locations. Oh and bonus of using that methodology: any interruptions in their planning/working process can be restored by pointing them to the plan and log files!
– Using certain patterns that bring the agent/mode’s attention to something that is important, not saying “IMPORTANT” or “CRITICAL”.
– Use Skills instead of MCP when you can! When Roo Code added the ability to use Skills, it tripled how effective my AgentAutoFlow is! Agent Skills open format: https://agentskills.io/home

Watch the animated musical story / prediction of where things could be going and how we get there. Trigger warning: It starts out dark.

A music video of the future by Scott Howard Swain