Advancing voice intelligence with new models in the API
Summary
OpenAI has launched three new audio models within its API to enable a new generation of voice applications. GPT-Realtime-2 offers GPT-5-class reasoning for natural conversations and complex requests. GPT-Realtime-Translate provides live speech translation across over 70 input and 13 output languages, keeping pace with speakers. GPT-Realtime-Whisper is a streaming speech-to-text model for low-latency live transcription. These models aim to move voice interfaces beyond simple responses towards agents that can listen, reason, translate, transcribe, and act in real-time, supporting patterns like voice-to-action, system-to-voice guidance, and voice-to-voice communication. GPT-Realtime-2 features improved context windows, recovery behavior, tone control, and reasoning capabilities, showing significant gains in audio intelligence and instruction following benchmarks. GPT-Realtime-Translate is designed for seamless multilingual voice experiences, while GPT-Realtime-Whisper enhances live business workflows with immediate transcription. The Realtime API includes safety measures, and pricing details for each model are provided.
(Source:OpenAI)