AI’s 2,500-Question Gauntlet Tests the Real Limits of Machine Intelligence

Researchers unveiled Humanity’s Last Exam (HLE), a 2,500-question global benchmark spanning math, the humanities, science, and niche disciplines to probe AI's true limits beyond older tests. Early models scored very low and even recent top systems reach roughly 40–50%, highlighting that high scores on human benchmarks don’t guarantee genuine understanding. Designed as a long-term, transparent gauge, HLE helps policymakers and developers assess capabilities and risks while keeping most questions hidden to prevent memorization; the project includes international experts including Texas A&M’s Dr. Tung Nguyen and is described in a Nature paper with details at lastexam.ai.
- Don’t Panic Yet: “Humanity’s Last Exam” Has Begun SciTechDaily
- Acing this new AI exam — which its creators say is the toughest in the world — might point to the first signs of AGI Live Science
- Stay Calm: ‘Humanity’s Final Test’ Has Begun BIOENGINEER.ORG
- Don't panic: 'Humanity's last exam' has begun Tech Xplore
- Researchers Launch “Humanity’s Last Exam” to Measure Frontier AI Capabilities BABL AI
Reading Insights
1
1
8 min
vs 9 min read
94%
1,611 → 95 words
Want the full story? Read the original article
Read on SciTechDaily