The World’s Toughest AI Exam Tests Reasoning, Not AGI Yet

A new benchmark called Humanity’s Last Exam aims to measure how close today’s AI models come to human-level knowledge by presenting 2,500 carefully vetted, PhD-level questions across 100+ subjects. Launched in 2025, it has been attempted by top models like GPT-4o, Google Gemini The top score reported so far is 48.4% (Gemini 3 Deep Think), far below typical human expert performance (~90%). The test prioritizes precise, non-searchable knowledge and verifiable answers, filtering out questions AI could answer via web search. While a high score would indicate expert-level capability in specific domains, researchers say it does not by itself signal AGI or autonomous, general intelligence.
- Acing this new AI exam — which its creators say is the toughest in the world — might point to the first signs of AGI Live Science
- Don’t Panic: ‘Humanity’s Last Exam’ has begun Texas A&M Stories
- "Humanity’s Last Exam": The Super-Benchmark AI Is Currently Failing Neuroscience News
- Stay Calm: ‘Humanity’s Final Test’ Has Begun Bioengineer.org
- Researchers Launch “Humanity’s Last Exam” to Measure Frontier AI Capabilities BABL AI
Reading Insights
0
2
68 min
vs 69 min read
99%
13,742 → 104 words
Want the full story? Read the original article
Read on Live Science