A shared playbook for trustworthy third party evaluations
Summary
This article discusses the critical role of independent third-party evaluations for frontier AI models. OpenAI emphasizes that as models evolve into autonomous agents capable of using tools and multi-step workflows, evaluations must move beyond simple chatbot-style interactions. The authors introduce the concept of the 'harness'—the environmental setup that facilitates model actions—as a key factor in determining performance. They provide a playbook for researchers to ensure transparency, recommending that reports explicitly document the claims being tested, the specific harness and budget used, and checks for hazards like reward hacking, contamination, and sandbagging. The goal is to establish standardized, rigorous evaluation practices that accurately reflect both capabilities and safety risks.
(Source:OpenAI)