Google's Gemini 2.5 Pro Claims AI Benchmark Crown

Google's Gemini 2.5 Pro Claims AI Benchmark Crown

Google has unveiled Gemini 2.5 Pro Experimental, a model designed to "think before responding" that has claimed top positions across major AI benchmarks, solidifying the company's competitive position against OpenAI and other AI labs.

Key Points:

  • The model features 1M token context window (2M coming soon)
  • Tops LMArena leaderboard by a significant margin
  • Achieves state-of-the-art 18.8% on Humanity's Last Exam
  • Excels at coding with 63.8% on SWE-Bench Verified

The release marks Google's full commitment to "thinking models" – AI systems that pause to reason through problems before generating answers. This approach has quickly become the new battleground for frontier AI labs since OpenAI debuted its first reasoning model, o1, in September 2024.

The model has taken the top spot on LMArena, a benchmark measuring human preferences, surpassing competing models by what Google calls a "significant margin." The company also reports state-of-the-art performance on challenging reasoning tasks, including an 18.8% score on Humanity's Last Exam – a dataset created to test the limits of AI knowledge and reasoning.

For developers, perhaps the most interesting advancement comes in coding capabilities. Google claims Gemini 2.5 Pro achieves 68.6% on Aider Polyglot, a code editing benchmark, outperforming models from OpenAI, Anthropic, and DeepSeek. On SWE-Bench Verified, a standard for measuring software development abilities, it scores 63.8% – though this falls short of Anthropic's Claude 3.7 Sonnet, which scored 70.3%.

Google CEO Sundar Pichai shared a demo on X showcasing a playable dinosaur game that the model created from a single-line prompt, generating executable code through its reasoning capabilities.

Google is making Gemini 2.5 Pro immediately available in Google AI Studio and to subscribers of its Gemini Advanced plan, with Vertex AI integration coming "in the coming weeks." The company has yet to announce pricing for API access, promising those details soon.

Another big advantage of Gemini 2.5 Pro is the model's 1 million token window – enough to process roughly 750,000 words at once, equivalent to the entire "Lord of the Rings" series – with plans to double that capacity soon.

With OpenAI, Anthropic, DeepSeek, and xAI all pushing the boundaries of AI reasoning capabilities, Google's release is both a significant technical achievement and a strategic positioning in the increasingly competitive AI landscape.

Chris McKay is the founder and chief editor of Maginative. His thought leadership in AI literacy and strategic AI adoption has been recognized by top academic institutions, media, and global brands.

Let’s stay in touch. Get the latest AI news from Maginative in your inbox.

Subscribe