The World’s Toughest AI Exam Tests Reasoning, Not AGI Yet

February 28, 2026 at 07:05 AM

•

1 min read

The World’s Toughest AI Exam Tests Reasoning, Not AGI Yet — Photo: Live Science

TL;DR Summary

A new benchmark called Humanity’s Last Exam aims to measure how close today’s AI models come to human-level knowledge by presenting 2,500 carefully vetted, PhD-level questions across 100+ subjects. Launched in 2025, it has been attempted by top models like GPT-4o, Google Gemini The top score reported so far is 48.4% (Gemini 3 Deep Think), far below typical human expert performance (~90%). The test prioritizes precise, non-searchable knowledge and verifiable answers, filtering out questions AI could answer via web search. While a high score would indicate expert-level capability in specific domains, researchers say it does not by itself signal AGI or autonomous, general intelligence.

Topics:science #agi #artificial-intelligence #benchmarking #gemini-deep-think #humanitys-last-exam #technology

Share this article

Acing this new AI exam — which its creators say is the toughest in the world — might point to the first signs of AGI Live Science
Don’t Panic: ‘Humanity’s Last Exam’ has begun Texas A&M Stories
"Humanity’s Last Exam": The Super-Benchmark AI Is Currently Failing Neuroscience News
Stay Calm: ‘Humanity’s Final Test’ Has Begun Bioengineer.org
Researchers Launch “Humanity’s Last Exam” to Measure Frontier AI Capabilities BABL AI

Reading Insights

Total Reads

Unique Readers

Time Saved

68 min

vs 69 min read

Condensed

99%

13,742 → 104 words

Want the full story? Read the original article

Read on Live Science

JavaScript Required

tl;dr daily news requires JavaScript to be enabled. Please enable JavaScript in your browser settings.

Related Sources

Reading Insights