Speeding up agentic workflows with WebSockets in the Responses API
Summary
To address latency bottlenecks in agentic workflows caused by repeated API overhead, OpenAI implemented persistent WebSocket connections for their Responses API. By caching conversation state and reducing redundant network calls, this update allows models like GPT-5.3-Codex-Spark to reach speeds of over 1,000 tokens per second. This architectural shift eliminates the need to rebuild context for every follow-up request, resulting in significant performance gains for developers and platforms like Vercel, Cline, and Cursor.
(Source:OpenAI)