Introducing GeneBench-Pro
Summary
GeneBench-Pro is a research-level benchmark created to test AI models on higher-order scientific reasoning and judgment in computational biology. Unlike standard benchmarks that test factual recall, GeneBench-Pro focuses on "research taste," requiring models to handle ambiguity, revise hypotheses, and navigate complex datasets toward decision-ready outcomes. Built with synthetic, causal data to ensure objective grading, the benchmark includes 129 problems across various domains. Results indicate that while frontier models like GPT-5.6 Sol show rapid improvement in scientific reasoning, they still struggle with the iterative, inferential processes that characterize expert human research.
(Source:OpenAI)