XAI Router Now Supports OpenAI WebSocket Mode: Official Behavior Alignment
Posted February 24, 2026 by XAI Tech Teamย โย 3ย min read

This is an engineering note for XAI Router's WebSocket support. As of 2026-02-24, XAI Router supports OpenAI WebSocket workflows for:
- Responses WebSocket mode (
wss://.../v1/responses) - Realtime WebSocket sessions (
wss://.../v1/realtime) - Coexistence with existing HTTP APIs without changing normal HTTP behavior
OpenAI WebSocket Mode: Key Semantics
According to OpenAI's official guide, core semantics for Responses WebSocket mode are:
- Keep a persistent connection to
/v1/responses - Start each turn with
response.create - Continue context with
previous_response_idplus incrementalinput - Sequential execution per connection: only one in-flight response at a time (no multiplexing)
- Connection lifetime limit of 60 minutes, then reconnect
How XAI Router Aligns
1) Path compatibility
XAI Router supports both path variants for easier client migration:
/v1/responsesand/responses/v1/realtimeand/realtime
2) Same sequential model as OpenAI
For /v1/responses in WebSocket mode:
- Multiple
response.createevents are allowed over one connection - But they must be sequential
- Concurrent in-flight
response.createevents on the same connection are rejected
This matches OpenAI's documented single-connection sequential behavior.
3) Conversation-state transparency
Fields like previous_response_id, incremental input, and store=false are preserved as conversation semantics. XAI Router focuses on model mapping, ACL checks, rate limits, routing, and usage accounting around them.
Unified WebSocket Architecture
This support is implemented through a unified framework (not endpoint-specific patches):
ws_framework: session lifecycle, relay, timeout control, and error handlingopenai-responses-wsadapter: turn lifecycle forresponse.create, response-id binding, usage finalizeopenai-realtime-wsadapter: realtime event relay and session usage tracking
The legacy /v1/realtime handling has also been migrated into the same framework to reduce branching and maintenance cost.
This diagram reflects the unified WS design: preserve OpenAI behavior while converging Responses and Realtime into one session/relay framework.
Minimal Responses WebSocket Example
The following example opens a connection via XAI Router and creates one gpt-5.2 response:
from websocket import create_connection
import json
import os
ws = create_connection(
"wss://api.xairouter.com/v1/responses",
header=[
f"Authorization: Bearer {os.environ['XAI_API_KEY']}",
],
)
ws.send(json.dumps({
"type": "response.create",
"model": "gpt-5.2",
"store": False,
"input": [
{
"type": "message",
"role": "user",
"content": [{"type": "input_text", "text": "Summarize websocket mode in one sentence."}]
}
],
"tools": []
}))
while True:
event = json.loads(ws.recv())
print(event.get("type"))
if event.get("type") in ("response.completed", "response.failed", "response.incomplete"):
break
ws.close()Performance and Stability Notes
Without changing external behavior, the implementation includes practical optimizations:
- Lightweight event-type prefilter before full JSON unmarshal on hot paths
- Shared relay framework for Responses and Realtime to reduce duplicated logic
- Cleaner connection-error handling with reduced log noise for expected disconnect patterns
Result: better maintainability and stable WS behavior while preserving existing HTTP behavior.
Conclusion
If your workload relies on long-lived, low-latency, multi-turn interaction, OpenAI WebSocket mode can be significantly better than rebuilding context on each HTTP request.
XAI Router's goal is straightforward: keep OpenAI semantics intact while adding production-grade control for routing, limits, policy, and accounting.
References
- OpenAI WebSocket Mode: https://developers.openai.com/api/docs/guides/websocket-mode
- OpenAI Realtime WebSocket guide: https://platform.openai.com/docs/guides/realtime-websocket
- OpenAI Responses API reference: https://platform.openai.com/docs/api-reference/responses/create