Acing this new AI exam — which its creators say is the toughest in the world — might point to the first signs of AGI
Summary
Researchers from the Center for AI Safety and Scale AI introduced "Humanity’s Last Exam" (HLE), a rigorous test designed to gauge how close current AI models are to achieving human-level knowledge across over 100 subjects. The exam consists of 2,500 non-searchable, precise, and unambiguous questions, vetted by over 1,000 subject-matter experts, aiming to surpass the limitations of existing benchmarks like MMLU. Initial testing showed poor performance, with OpenAI’s o1 scoring only 8.3%, though researchers predicted models could reach 50% accuracy by the end of 2025. As of February 2026, the highest score achieved was 48.4% by Google’s Gemini 3 Deep Think, significantly lower than the 90% achieved by human experts. The creators emphasize that while high accuracy on HLE demonstrates expert-level performance on verifiable questions, it is a necessary but not sufficient condition for concluding that Artificial General Intelligence (AGI) has been reached.
(Source:Live Science)