Predicting model behavior before release by simulating deployment
Summary
Deployment Simulation is a pre-deployment safety method that replays past user conversations with a candidate model to observe its responses in realistic, non-adversarial contexts. By utilizing representative production traffic, this technique helps identify novel misaligned behaviors, reduces the likelihood of models recognizing they are being tested, and provides quantitative estimates of undesired behavior rates. While it complements traditional red-teaming and adversarial evaluations by offering a more accurate look at how models perform in real-world conditions, it is not a replacement for tail-risk analysis, as it is most effective for behaviors occurring with sufficient frequency.
(Source:OpenAI)