Google Unveils Next-Gen AI Reasoning Models

On Tuesday, Google unveiled Gemini 2.5, a new family of AI reasoning models designed to pause and “think” before answering a question.

Contents

Performance Benchmarks
Context Length & Future Plans

To launch this new model family, Google is introducing Gemini 2.5 Pro Experimental, a multimodal, reasoning AI model that the company claims is its most intelligent yet. This model is available starting today on Google AI Studio and in the Gemini app for subscribers to the company’s $20-a-month AI plan, Gemini Advanced.

Moving forward, Google states that all its upcoming AI models will feature built-in reasoning capabilities.

Since OpenAI launched the first AI reasoning model, o1, in September 2024, the tech industry has been in a race to develop and surpass its capabilities. Today, Anthropic, DeepSeek, Google, and xAI all have AI reasoning models, utilizing extra computing power to fact-check and reason through problems before delivering responses.

Reasoning techniques have significantly improved AI performance in math and coding tasks. Many in the industry believe these models will be crucial for AI agents—autonomous systems capable of executing tasks with minimal human intervention. However, these models are computationally expensive.

Google has previously experimented with AI reasoning models, releasing a “thinking” version of Gemini in December. But Gemini 2.5 represents the company’s most serious attempt yet to surpass OpenAI’s “o” series.

Performance Benchmarks

Google claims Gemini 2.5 Pro outperforms its previous frontier AI models and some leading competitors on various benchmarks. Specifically, Gemini 2.5 is designed to excel at creating visually compelling web applications and agentic coding tasks.

On the Aider Polyglot benchmark (measuring code editing capabilities), Gemini 2.5 Pro scores 68.6%, outperforming top AI models from OpenAI, Anthropic, and DeepSeek.
On the SWE-bench Verified test (measuring software development abilities), Gemini 2.5 Pro scores 63.8%, surpassing OpenAI’s o3-mini and DeepSeek’s R1, but trailing behind Claude 3.7 Sonnet, which achieved 70.3%.
On Humanity’s Last Exam, a multimodal benchmark featuring thousands of crowdsourced questions spanning mathematics, humanities, and natural sciences, Google reports that Gemini 2.5 Pro scores 18.8%, outperforming most rival flagship models.

Context Length & Future Plans

At launch, Gemini 2.5 Pro includes a 1-million token context window, allowing it to process around 750,000 words in a single instance—longer than the entire “Lord of the Rings” book series. Google has also announced plans to expand this capability to 2 million tokens soon.

Google has not yet disclosed API pricing for Gemini 2.5 Pro but plans to share more details in the coming weeks.

Also Read : 23andMe Faces an Uncertain Future — So Does Your Genetic Data

SpaceX Wants to Send Humans to Mars by 2028 Here’s Why That’s Not Likely

Meta Expands Instagram’s Safety Tools for Young Users

Scale AI Lays Off 200 Employees Amid Major Meta Investment

GM and Redwood Materials Team Up to Repurpose EV Batteries for Powering Data Centers

US Army Soldier Pleads Guilty to Hacking Telecom Companies and Extortion

Google Unveils Next-Gen AI Reasoning Models

SpaceX Wants to Send Humans to Mars by 2028 Here’s Why That’s Not Likely

Meta Expands Instagram’s Safety Tools for Young Users

Scale AI Lays Off 200 Employees Amid Major Meta Investment

Google Unveils Next-Gen AI Reasoning Models

Performance Benchmarks

Context Length & Future Plans

Related Posts