How OpenAI delivers low-latency voice AI at scale

OpenAI
OpenAI rearchitected its WebRTC infrastructure using a split relay and transceiver model to provide low-latency voice AI for millions of users.

Summary

OpenAI developed a custom 'split relay plus transceiver' architecture to maintain high-performance, low-latency voice AI. By offloading WebRTC session management to specialized transceivers and using a lightweight relay layer for packet routing, they avoided the complexities of exposing massive UDP port ranges in Kubernetes. This design preserves standard WebRTC compatibility while ensuring efficient global connectivity and scalable real-time performance for ChatGPT Voice and the Realtime API.

(Source:OpenAI)