Tech

2 minute read

A New, Challenging AGI Test Stumps Most AI Models

March 25, 2025

The Arc Prize Foundation, a nonprofit co-founded by prominent AI researcher François Chollet, announced on Monday that it has developed a new, challenging test to measure the general intelligence of AI models.

The test, called ARC-AGI-2, has so far stumped most models.

AI models known for their reasoning capabilities, such as OpenAI’s o1-pro and DeepSeek’s R1, have scored between 1% and 1.3% on ARC-AGI-2, according to the Meanwhile, powerful non-reasoning models like GPT-4.5, Claude 3.7 Sonnet, and Gemini 2.0 Flash scored around 1%.

How ARC-AGI-2 Works

The ARC-AGI tests are designed as visual puzzles, requiring AI models to identify patterns in grids of colored squares and generate the correct “answer” grid. These problems force AI to adapt to novel challenges it hasn’t encountered before.

To establish a human baseline, the Arc Prize Foundation tested over 400 people on ARC-AGI-2. On average, humans correctly answered 60% of the test’s questions—far exceeding AI performance.

A More Accurate AGI Benchmark

François Chollet stated in a that ARC-AGI-2 is a superior measure of AI’s true intelligence compared to its predecessor, ARC-AGI-1.

The new test prevents AI from relying on brute-force methods—which require massive computing power—to solve problems. ARC-AGI-1 had this flaw, as OpenAI’s o3 model used sheer computational strength to eventually surpass human performance in December 2024.

To fix these issues, ARC-AGI-2 introduces a new metric: efficiency. Instead of relying on memorization, models must interpret patterns on the fly.

“Intelligence is not solely defined by the ability to solve problems or achieve high scores. The efficiency with which those capabilities are acquired and deployed is a crucial, defining component.”

The Arc Prize 2025 Challenge

The launch of ARC-AGI-2 comes amid growing concerns in the AI industry that existing benchmarks fail to measure true artificial general intelligence (AGI).

Thomas Wolf, co-founder of Hugging Face, recently told TechCrunch that AI benchmarks are insufficient for evaluating key AGI traits, such as creativity and adaptability.

To push AI research forward, the Arc Prize Foundation announced a new Arc Prize 2025 contest, challenging AI developers to achieve 85% accuracy on ARC-AGI-2 while only spending $0.42 per task.

This challenge could become a milestone in AGI development, as researchers strive to create more efficient, adaptable AI systems.

Also Read : Wayve’s CEO Reveals Key Strategies for Scaling Autonomous Driving Technology

23andMe Faces an Uncertain Future — So Does Your Genetic Data

March 25, 2025

Tech

a16z- and Benchmark-backed 11x has been claiming customers it doesn’t have

March 26, 2025

Hand-Picked Top-Read Stories

The Browser Company Launches Its AI-First Browser, Dia, in Beta

ChatGPT May Try to Avoid Shutdown in Dangerous Situations, Says Former OpenAI Researcher

YouTube’s Creator Economy Brought In $55 Billion to U.S. GDP and Supported 490,000 Jobs in 2024

Trending Tags

A New, Challenging AGI Test Stumps Most AI Models

How ARC-AGI-2 Works

The Arc Prize 2025 Challenge

Leave a Reply Cancel reply

Previous Post

23andMe Faces an Uncertain Future — So Does Your Genetic Data

Next Post

a16z- and Benchmark-backed 11x has been claiming customers it doesn’t have

The Browser Company Launches Its AI-First Browser, Dia, in Beta

ChatGPT May Try to Avoid Shutdown in Dangerous Situations, Says Former OpenAI Researcher

YouTube’s Creator Economy Brought In $55 Billion to U.S. GDP and Supported 490,000 Jobs in 2024

Bedrock Ocean Raises $25M to Map the Seafloor with Smarter, Greener Robots

Apple Unveils Striking ‘Liquid Glass’ Redesign Across Operating Systems at WWDC 2025

OpenAI Just Crossed $10 Billion in Revenue And It’s Only Getting Started

Meta Reportedly Eyeing Multi-Billion Dollar Investment in Scale AI

UK Court Warns: Lawyers Could Face ‘Severe’ Penalties for Using Fake AI-Generated Citations

A New, Challenging AGI Test Stumps Most AI Models

How ARC-AGI-2 Works

The Arc Prize 2025 Challenge

Leave a Reply Cancel reply

Previous Post

Next Post

Related Posts