Advancing voice intelligence with new models in the API

中文日本語 Español

OpenAI May 7, 2026

OpenAI introduces three new API models for advanced voice applications, enhancing natural interaction and real-time capabilities.

Read Full Article

Summary

OpenAI has launched three new audio models within its API to enable a new generation of voice applications. GPT-Realtime-2 offers GPT-5-class reasoning for natural conversations and complex requests. GPT-Realtime-Translate provides live speech translation across over 70 input and 13 output languages, keeping pace with speakers. GPT-Realtime-Whisper is a streaming speech-to-text model for low-latency live transcription. These models aim to move voice interfaces beyond simple responses towards agents that can listen, reason, translate, transcribe, and act in real-time, supporting patterns like voice-to-action, system-to-voice guidance, and voice-to-voice communication. GPT-Realtime-2 features improved context windows, recovery behavior, tone control, and reasoning capabilities, showing significant gains in audio intelligence and instruction following benchmarks. GPT-Realtime-Translate is designed for seamless multilingual voice experiences, while GPT-Realtime-Whisper enhances live business workflows with immediate transcription. The Realtime API includes safety measures, and pricing details for each model are provided.

(Source：OpenAI)

中文日本語 Español

Read Full Article

The Verge May 7, 2026

ChatGPT’s ‘Trusted Contact’ will alert loved ones of safety concerns

TechCrunch May 7, 2026

How Anthropic’s Mythos has rewritten Firefox’s approach to cybersecurity

The Verge May 7, 2026

OpenClaw and Claude can put your AI-generated podcasts in Spotify

The Verge May 7, 2026

Google’s taking a big swing at AI health with the Fitbit Air

TechCrunch May 7, 2026