Tag

Ai Benchmarks

All articles tagged with #ai benchmarks

AI’s 2,500-Question Gauntlet Tests the Real Limits of Machine Intelligence
technology4 hours ago

AI’s 2,500-Question Gauntlet Tests the Real Limits of Machine Intelligence

Researchers unveiled Humanity’s Last Exam (HLE), a 2,500-question global benchmark spanning math, the humanities, science, and niche disciplines to probe AI's true limits beyond older tests. Early models scored very low and even recent top systems reach roughly 40–50%, highlighting that high scores on human benchmarks don’t guarantee genuine understanding. Designed as a long-term, transparent gauge, HLE helps policymakers and developers assess capabilities and risks while keeping most questions hidden to prevent memorization; the project includes international experts including Texas A&M’s Dr. Tung Nguyen and is described in a Nature paper with details at lastexam.ai.

OpenAI Enhances Codex with GPT-5 Upgrade
technology5 months ago

OpenAI Enhances Codex with GPT-5 Upgrade

OpenAI has released GPT-5-Codex, an upgraded version of its AI coding agent, which features dynamic thinking capabilities allowing it to spend varying amounts of time on tasks, improving performance on coding benchmarks and code reviews. The model is now available to various ChatGPT users and will be accessible via API in the future, aiming to enhance competitiveness in the crowded AI coding market.

Google Launches Gemini Deep Think AI for Advanced Parallel Reasoning
technology7 months ago

Google Launches Gemini Deep Think AI for Advanced Parallel Reasoning

Google DeepMind has launched Gemini 2.5 Deep Think, its most advanced multi-agent AI reasoning model capable of exploring multiple ideas simultaneously to improve answer quality, outperforming other models on various benchmarks, and integrating tools like code execution and search. The model is available to subscribers and aims to enhance research and problem-solving capabilities, with plans for broader testing.