XAI Router Now Supports OpenAI WebSocket Mode: Official Behavior Alignment
Posted February 24, 2026 by XAI Tech TeamΒ βΒ 4Β min read
This is an engineering note for XAI Router's WebSocket support. As of 2026-02-24, XAI Router supports OpenAI WebSocket workflows for:
- Responses WebSocket mode (
wss://.../v1/responses) - Realtime WebSocket sessions (
wss://.../v1/realtime) - Coexistence with existing HTTP APIs without changing normal HTTP behavior
OpenAI WebSocket Mode: Key Semantics
According to OpenAI's official guide, core semantics for Responses WebSocket mode are:
- Keep a persistent connection to
/v1/responses - Start each turn with
response.create - Continue context with
previous_response_idplus incrementalinput - Sequential execution per connection: only one in-flight response at a time (no multiplexing)
- Connection lifetime limit of 60 minutes, then reconnect
How XAI Router Aligns
1) Path compatibility
XAI Router supports both path variants for easier client migration:
/v1/responsesand/responses/v1/realtimeand/realtime
2) Same sequential model as OpenAI
For /v1/responses in WebSocket mode:
- Multiple
response.createevents are allowed over one connection - But they must be sequential
- Concurrent in-flight
response.createevents on the same connection are rejected
This matches OpenAI's documented single-connection sequential behavior.
3) Conversation-state transparency
Fields like previous_response_id, incremental input, and store=false are preserved as conversation semantics. XAI Router focuses on model mapping, ACL checks, rate limits, routing, and usage accounting around them.
Unified WebSocket Architecture
This support is implemented through a unified framework (not endpoint-specific patches):
ws_framework: session lifecycle, relay, timeout control, and error handlingopenai-responses-wsadapter: turn lifecycle forresponse.create, response-id binding, usage finalizeopenai-realtime-wsadapter: realtime event relay and session usage tracking
The legacy /v1/realtime handling has also been migrated into the same framework to reduce branching and maintenance cost.
This diagram reflects the unified WS design: preserve OpenAI behavior while converging Responses and Realtime into one session/relay framework.
Minimal Responses WebSocket Example
The following example opens a connection via XAI Router and creates one gpt-5.4 response:
from websocket import create_connection
import json
import os
ws = create_connection(
"wss://api.xairouter.com/v1/responses",
header=[
f"Authorization: Bearer {os.environ['XAI_API_KEY']}",
],
)
ws.send(json.dumps({
"type": "response.create",
"model": "gpt-5.4",
"store": False,
"input": [
{
"type": "message",
"role": "user",
"content": [{"type": "input_text", "text": "Summarize websocket mode in one sentence."}]
}
],
"tools": []
}))
while True:
event = json.loads(ws.recv())
print(event.get("type"))
if event.get("type") in ("response.completed", "response.failed", "response.incomplete"):
break
ws.close()Codex CLI Config (Reference Baseline)
If you use Codex CLI with XAI Router, this is a working reference baseline config:
model_provider = "xai"
model = "gpt-5.4"
model_reasoning_effort = "xhigh"
plan_mode_reasoning_effort = "xhigh"
model_reasoning_summary = "none"
model_verbosity = "medium"
approval_policy = "never"
sandbox_mode = "danger-full-access"
[model_providers.xai]
name = "OpenAI"
base_url = "https://api.xairouter.com"
wire_api = "responses"
requires_openai_auth = false
env_key = "XAI_API_KEY"Notes:
- This can be used as a reference baseline for
~/.codex/config.toml. - Older examples used explicit
supports_websocketsandresponses_websockets_v2flags; if your Codex build still exposes those switches, add them back according to that build's docs. env_key = "XAI_API_KEY"only tells Codex which environment variable to read; on Linux use~/.bashrc, on macOS prefer~/.zshrc, and on Windows use a user environment variable before reopening the shell. On some older macOS setups, legacy terminals, or IDE sessions that still inherit a bash login environment, also mirror the variable into~/.bash_profile, and into~/.bashrcif needed.- Restart your Codex session after updating the config.
Performance and Stability Notes
Without changing external behavior, the implementation includes practical optimizations:
- Lightweight event-type prefilter before full JSON unmarshal on hot paths
- Shared relay framework for Responses and Realtime to reduce duplicated logic
- Cleaner connection-error handling with reduced log noise for expected disconnect patterns
Result: better maintainability and stable WS behavior while preserving existing HTTP behavior.
Conclusion
If your workload relies on long-lived, low-latency, multi-turn interaction, OpenAI WebSocket mode can be significantly better than rebuilding context on each HTTP request.
XAI Router's goal is straightforward: keep OpenAI semantics intact while adding production-grade control for routing, limits, policy, and accounting.
References
- OpenAI WebSocket Mode: https://developers.openai.com/api/docs/guides/websocket-mode
- OpenAI Realtime WebSocket guide: https://platform.openai.com/docs/guides/realtime-websocket
- OpenAI Responses API reference: https://platform.openai.com/docs/api-reference/responses/create