Tag

Ai Benchmarks

All articles tagged with #ai benchmarks

technology5 hours ago•8 min saved

AI’s 2,500-Question Gauntlet Tests the Real Limits of Machine Intelligence

Researchers unveiled Humanity’s Last Exam (HLE), a 2,500-question global benchmark spanning math, the humanities, science, and niche disciplines to probe AI's true limits beyond older tests. Early models scored very low and even recent top systems reach roughly 40–50%, highlighting that high scores on human benchmarks don’t guarantee genuine understanding. Designed as a long-term, transparent gauge, HLE helps policymakers and developers assess capabilities and risks while keeping most questions hidden to prevent memorization; the project includes international experts including Texas A&M’s Dr. Tung Nguyen and is described in a Nature paper with details at lastexam.ai.

via SciTechDaily|

#ai-benchmarks #artificial-intelligence #humanitys-last-exam

technology2 months ago•1 min saved

Google's Gemini 3 Flash Outperforms GPT-5.2 and Launches Globally

Google's Gemini 3 Flash, a more efficient version of its latest AI model, performs comparably to GPT-5.2 in benchmarks, even surpassing it in some tests, signaling a significant advancement in AI capabilities and competition with OpenAI.

via Engadget|

#ai-benchmarks #ai-model-performance #google-ai-updates

technology5 months ago•1 min saved

AMD Launches ROCm 7.0 with Major Performance Boosts to Challenge Nvidia

The AMD Ryzen AI Max+ 'Strix Halo' SoCs successfully ran ROCm 7.0 on Ubuntu Linux, despite not being listed on the supported GPU list, with benchmarks showing functional performance on AI tasks and graphics workloads.

via Phoronix|

#ai-benchmarks #amd-ryzen-ai-max #gpu-support

technology5 months ago•2 min saved

OpenAI Enhances Codex with GPT-5 Upgrade

OpenAI has released GPT-5-Codex, an upgraded version of its AI coding agent, which features dynamic thinking capabilities allowing it to spend varying amounts of time on tasks, improving performance on coding benchmarks and code reviews. The model is now available to various ChatGPT users and will be accessible via API in the future, aiming to enhance competitiveness in the crowded AI coding market.

via TechCrunch|

#ai-benchmarks #ai-coding #gpt-5-codex

technology7 months ago•3 min saved

Google Launches Gemini Deep Think AI for Advanced Parallel Reasoning

Google DeepMind has launched Gemini 2.5 Deep Think, its most advanced multi-agent AI reasoning model capable of exploring multiple ideas simultaneously to improve answer quality, outperforming other models on various benchmarks, and integrating tools like code execution and search. The model is available to subscribers and aims to enhance research and problem-solving capabilities, with plans for broader testing.

via TechCrunch|

#ai-benchmarks #ai-reasoning-model #google-gemini-deep-think

technology8 months ago•1 min saved

Google unveils enhanced Gemini 2.5 Pro with improved coding and increased query limits

Google has released an upgraded preview of Gemini 2.5 Pro, enhancing coding capabilities and overall performance, with improvements in creativity and response formatting, set for general availability in a few weeks.

via 9to5Google|

#ai-benchmarks #ai-model-update #gemini-25-pro