XAI Router Now Supports OpenAI WebSocket Mode: Official Behavior Alignment

Posted February 24, 2026 by XAI Tech Team ‐ 4 min read

This is an engineering note for XAI Router's WebSocket support. As of 2026-02-24, XAI Router supports OpenAI WebSocket workflows for:

Responses WebSocket mode (wss://.../v1/responses)
Realtime WebSocket sessions (wss://.../v1/realtime)
Coexistence with existing HTTP APIs without changing normal HTTP behavior

OpenAI WebSocket Mode: Key Semantics

According to OpenAI's official guide, core semantics for Responses WebSocket mode are:

Keep a persistent connection to /v1/responses
Start each turn with response.create
Continue context with previous_response_id plus incremental input
Sequential execution per connection: only one in-flight response at a time (no multiplexing)
Connection lifetime limit of 60 minutes, then reconnect

How XAI Router Aligns

1) Path compatibility

XAI Router supports both path variants for easier client migration:

/v1/responses and /responses
/v1/realtime and /realtime

2) Same sequential model as OpenAI

For /v1/responses in WebSocket mode:

Multiple response.create events are allowed over one connection
But they must be sequential
Concurrent in-flight response.create events on the same connection are rejected

This matches OpenAI's documented single-connection sequential behavior.

3) Conversation-state transparency

Fields like previous_response_id, incremental input, and store=false are preserved as conversation semantics. XAI Router focuses on model mapping, ACL checks, rate limits, routing, and usage accounting around them.

Unified WebSocket Architecture

This support is implemented through a unified framework (not endpoint-specific patches):

ws_framework: session lifecycle, relay, timeout control, and error handling
openai-responses-ws adapter: turn lifecycle for response.create, response-id binding, usage finalize
openai-realtime-ws adapter: realtime event relay and session usage tracking

The legacy /v1/realtime handling has also been migrated into the same framework to reduce branching and maintenance cost.

XAI Router OpenAI WebSocket Alignment Diagram

This diagram reflects the unified WS design: preserve OpenAI behavior while converging Responses and Realtime into one session/relay framework.

Minimal Responses WebSocket Example

The following example opens a connection via XAI Router and creates one gpt-5.4 response:

from websocket import create_connection
import json
import os

ws = create_connection(
    "wss://api.xairouter.com/v1/responses",
    header=[
        f"Authorization: Bearer {os.environ['XAI_API_KEY']}",
    ],
)

ws.send(json.dumps({
    "type": "response.create",
    "model": "gpt-5.4",
    "store": False,
    "input": [
        {
            "type": "message",
            "role": "user",
            "content": [{"type": "input_text", "text": "Summarize websocket mode in one sentence."}]
        }
    ],
    "tools": []
}))

while True:
    event = json.loads(ws.recv())
    print(event.get("type"))
    if event.get("type") in ("response.completed", "response.failed", "response.incomplete"):
        break

ws.close()

Codex CLI Config (Reference Baseline)

If you use Codex CLI with XAI Router, this is a working reference baseline config:

model_provider = "xai"
model = "gpt-5.5"
model_reasoning_effort = "xhigh"
plan_mode_reasoning_effort = "xhigh"
model_reasoning_summary = "none"
approval_policy = "never"
sandbox_mode = "danger-full-access"

[model_providers.xai]
name = "OpenAI"
base_url = "https://api.xairouter.com"
wire_api = "responses"
requires_openai_auth = false
env_key = "XAI_API_KEY"
supports_websockets = true

[features]
responses_websockets_v2 = true
multi_agent = true
remote_connections = true

Notes:

This can be used as a reference baseline for ~/.codex/config.toml.
Older examples used explicit supports_websockets and responses_websockets_v2 flags; if your Codex build still exposes those switches, add them back according to that build's docs.
env_key = "XAI_API_KEY" only tells Codex which environment variable to read; on Linux use ~/.bashrc, on macOS prefer ~/.zshrc, and on Windows use a user environment variable before reopening the shell. On some older macOS setups, legacy terminals, or IDE sessions that still inherit a bash login environment, also mirror the variable into ~/.bash_profile, and into ~/.bashrc if needed.
Restart your Codex session after updating the config.

Performance and Stability Notes

Without changing external behavior, the implementation includes practical optimizations:

Lightweight event-type prefilter before full JSON unmarshal on hot paths
Shared relay framework for Responses and Realtime to reduce duplicated logic
Cleaner connection-error handling with reduced log noise for expected disconnect patterns

Result: better maintainability and stable WS behavior while preserving existing HTTP behavior.

Conclusion

If your workload relies on long-lived, low-latency, multi-turn interaction, OpenAI WebSocket mode can be significantly better than rebuilding context on each HTTP request.

XAI Router's goal is straightforward: keep OpenAI semantics intact while adding production-grade control for routing, limits, policy, and accounting.

References

OpenAI WebSocket Mode: https://developers.openai.com/api/docs/guides/websocket-mode
OpenAI Realtime WebSocket guide: https://platform.openai.com/docs/guides/realtime-websocket
OpenAI Responses API reference: https://platform.openai.com/docs/api-reference/responses/create