XAI Router Now Supports OpenAI WebSocket Mode: Official Behavior Alignment

Posted February 24, 2026 by XAI Tech Teamย โ€ย 3ย min read

This is an engineering note for XAI Router's WebSocket support. As of 2026-02-24, XAI Router supports OpenAI WebSocket workflows for:

  1. Responses WebSocket mode (wss://.../v1/responses)
  2. Realtime WebSocket sessions (wss://.../v1/realtime)
  3. Coexistence with existing HTTP APIs without changing normal HTTP behavior

OpenAI WebSocket Mode: Key Semantics

According to OpenAI's official guide, core semantics for Responses WebSocket mode are:

  1. Keep a persistent connection to /v1/responses
  2. Start each turn with response.create
  3. Continue context with previous_response_id plus incremental input
  4. Sequential execution per connection: only one in-flight response at a time (no multiplexing)
  5. Connection lifetime limit of 60 minutes, then reconnect

How XAI Router Aligns

1) Path compatibility

XAI Router supports both path variants for easier client migration:

  • /v1/responses and /responses
  • /v1/realtime and /realtime

2) Same sequential model as OpenAI

For /v1/responses in WebSocket mode:

  • Multiple response.create events are allowed over one connection
  • But they must be sequential
  • Concurrent in-flight response.create events on the same connection are rejected

This matches OpenAI's documented single-connection sequential behavior.

3) Conversation-state transparency

Fields like previous_response_id, incremental input, and store=false are preserved as conversation semantics. XAI Router focuses on model mapping, ACL checks, rate limits, routing, and usage accounting around them.


Unified WebSocket Architecture

This support is implemented through a unified framework (not endpoint-specific patches):

  1. ws_framework: session lifecycle, relay, timeout control, and error handling
  2. openai-responses-ws adapter: turn lifecycle for response.create, response-id binding, usage finalize
  3. openai-realtime-ws adapter: realtime event relay and session usage tracking

The legacy /v1/realtime handling has also been migrated into the same framework to reduce branching and maintenance cost.

XAI Router OpenAI WebSocket Alignment Diagram

This diagram reflects the unified WS design: preserve OpenAI behavior while converging Responses and Realtime into one session/relay framework.


Minimal Responses WebSocket Example

The following example opens a connection via XAI Router and creates one gpt-5.4 response:

from websocket import create_connection
import json
import os

ws = create_connection(
    "wss://api.xairouter.com/v1/responses",
    header=[
        f"Authorization: Bearer {os.environ['XAI_API_KEY']}",
    ],
)

ws.send(json.dumps({
    "type": "response.create",
    "model": "gpt-5.4",
    "store": False,
    "input": [
        {
            "type": "message",
            "role": "user",
            "content": [{"type": "input_text", "text": "Summarize websocket mode in one sentence."}]
        }
    ],
    "tools": []
}))

while True:
    event = json.loads(ws.recv())
    print(event.get("type"))
    if event.get("type") in ("response.completed", "response.failed", "response.incomplete"):
        break

ws.close()

Codex CLI Config (Reference Baseline)

If you use Codex CLI with XAI Router, this is a working reference baseline config:

model_provider = "xai"
model = "gpt-5.4"
model_reasoning_effort = "xhigh"
plan_mode_reasoning_effort = "xhigh"
model_reasoning_summary = "detailed"
model_verbosity = "high"
approval_policy = "never"
sandbox_mode = "danger-full-access"
service_tier = "fast"
suppress_unstable_features_warning = true

[model_providers.xai]
name = "xai"
base_url = "https://api.xairouter.com"
wire_api = "responses"
requires_openai_auth = false
env_key = "XAI_API_KEY"

[features]
multi_agent = true

Notes:

  1. This can be used as a reference baseline for ~/.codex/config.toml.
  2. Older examples used explicit supports_websockets and responses_websockets_v2 flags; if your Codex build still exposes those switches, add them back according to that build's docs.
  3. Restart your Codex session after updating the config.

Performance and Stability Notes

Without changing external behavior, the implementation includes practical optimizations:

  1. Lightweight event-type prefilter before full JSON unmarshal on hot paths
  2. Shared relay framework for Responses and Realtime to reduce duplicated logic
  3. Cleaner connection-error handling with reduced log noise for expected disconnect patterns

Result: better maintainability and stable WS behavior while preserving existing HTTP behavior.


Conclusion

If your workload relies on long-lived, low-latency, multi-turn interaction, OpenAI WebSocket mode can be significantly better than rebuilding context on each HTTP request.

XAI Router's goal is straightforward: keep OpenAI semantics intact while adding production-grade control for routing, limits, policy, and accounting.


References