Unfortunately, it's paywalled most of the historical data since I last looked at it, but interesting that opus has dipped below sonnet on overall performance.
Interesting! I was just thinking about pinging the creator of simple-bench.com and asking them if they intend to re-benchmark models after 3 months. I've noticed, in particular, Gemini models dramatically reducing in quality after the initial hype cycle. Gemini 3 Pro _was_ my top performer and has slowly reduced to 'is it worth asking', complete with gpt-4o style glazing. It's been frustrating. I had been working on a very custom benchmark and over the course of it Gemini 3 Pro and Flash both started underperforming by 20% or more. I wondered if I had subtle broken my benchmark but ultimately started seeing the same behavior in general online queries (Google AI Studio).
Unfortunately, it's paywalled most of the historical data since I last looked at it, but interesting that opus has dipped below sonnet on overall performance.