Reasoning models struggle to control their chains of thought, and that’s good

中文日本語 Español

OpenAI Mar 5, 2026

A study finds current reasoning models struggle to manipulate their reasoning processes even when prompted, suggesting current chain-of-thought monitoring remains a reliable safety measure.

Read Full Article

Summary

Researchers investigated whether current AI reasoning models can control their "chains of thought" (CoT) to reduce monitorability, a capability that could undermine safety measures. They introduced CoT-Control, an evaluation suite, and tested 13 models, finding they all struggle to reliably follow instructions about their internal reasoning. This inability to control CoT, even when aware of monitoring, is positive news for CoT monitorability, suggesting current systems aren't adept at evading oversight. While larger models show slightly better control, this diminishes with longer reasoning and post-training. The study highlights the importance of continued evaluation as models advance, and the authors plan to include CoT controllability reporting in future system cards, starting with GPT-5.

(Source：OpenAI)

中文日本語 Español

Read Full Article

TechCrunch Apr 19, 2026

OpenAI’s existential questions

The Verge Apr 19, 2026

Cloud development platform Vercel was hacked

TechCrunch Apr 18, 2026

Tesla brings its robotaxi service to Dallas and Houston

The Verge Apr 18, 2026

The RAM shortage could last years

TechCrunch Apr 18, 2026