Our First Proof submissions

OpenAI
OpenAI shared proof attempts for the challenging First Proof math competition to test AI's ability to produce checkable, domain-specific arguments.

Summary

OpenAI released its proof attempts for the First Proof challenge, a research-level math competition designed to test if AI can generate correct, end-to-end, checkable arguments in specialized domains. The model attempted all 10 problems, and based on expert feedback, OpenAI believes at least five submissions (problems 4, 5, 6, 9, and 10) have a high chance of being correct, though they revised their initial assessment on problem 2. The company views frontier challenges like First Proof as crucial for evaluating capabilities beyond standard benchmarks, such as sustaining long chains of reasoning and handling ambiguity. The process involved limited human supervision, including suggesting retries and using ChatGPT for verification, and OpenAI acknowledges the sprint was not perfectly controlled. This work builds upon previous milestones in frontier reasoning models, including IMO performance and GPT-5 case studies, and OpenAI seeks community engagement for future rigorous evaluations.

(Source:OpenAI)