XAI Router Now Supports OpenAI WebSocket Mode: Official Behavior Alignment

Posted February 24, 2026 by XAI Tech Teamย โ€ย 3ย min read

This is an engineering note for XAI Router's WebSocket support. As of 2026-02-24, XAI Router supports OpenAI WebSocket workflows for:

  1. Responses WebSocket mode (wss://.../v1/responses)
  2. Realtime WebSocket sessions (wss://.../v1/realtime)
  3. Coexistence with existing HTTP APIs without changing normal HTTP behavior

OpenAI WebSocket Mode: Key Semantics

According to OpenAI's official guide, core semantics for Responses WebSocket mode are:

  1. Keep a persistent connection to /v1/responses
  2. Start each turn with response.create
  3. Continue context with previous_response_id plus incremental input
  4. Sequential execution per connection: only one in-flight response at a time (no multiplexing)
  5. Connection lifetime limit of 60 minutes, then reconnect

How XAI Router Aligns

1) Path compatibility

XAI Router supports both path variants for easier client migration:

  • /v1/responses and /responses
  • /v1/realtime and /realtime

2) Same sequential model as OpenAI

For /v1/responses in WebSocket mode:

  • Multiple response.create events are allowed over one connection
  • But they must be sequential
  • Concurrent in-flight response.create events on the same connection are rejected

This matches OpenAI's documented single-connection sequential behavior.

3) Conversation-state transparency

Fields like previous_response_id, incremental input, and store=false are preserved as conversation semantics. XAI Router focuses on model mapping, ACL checks, rate limits, routing, and usage accounting around them.


Unified WebSocket Architecture

This support is implemented through a unified framework (not endpoint-specific patches):

  1. ws_framework: session lifecycle, relay, timeout control, and error handling
  2. openai-responses-ws adapter: turn lifecycle for response.create, response-id binding, usage finalize
  3. openai-realtime-ws adapter: realtime event relay and session usage tracking

The legacy /v1/realtime handling has also been migrated into the same framework to reduce branching and maintenance cost.

XAI Router OpenAI WebSocket Alignment Diagram

This diagram reflects the unified WS design: preserve OpenAI behavior while converging Responses and Realtime into one session/relay framework.


Minimal Responses WebSocket Example

The following example opens a connection via XAI Router and creates one gpt-5.2 response:

from websocket import create_connection
import json
import os

ws = create_connection(
    "wss://api.xairouter.com/v1/responses",
    header=[
        f"Authorization: Bearer {os.environ['XAI_API_KEY']}",
    ],
)

ws.send(json.dumps({
    "type": "response.create",
    "model": "gpt-5.2",
    "store": False,
    "input": [
        {
            "type": "message",
            "role": "user",
            "content": [{"type": "input_text", "text": "Summarize websocket mode in one sentence."}]
        }
    ],
    "tools": []
}))

while True:
    event = json.loads(ws.recv())
    print(event.get("type"))
    if event.get("type") in ("response.completed", "response.failed", "response.incomplete"):
        break

ws.close()

Performance and Stability Notes

Without changing external behavior, the implementation includes practical optimizations:

  1. Lightweight event-type prefilter before full JSON unmarshal on hot paths
  2. Shared relay framework for Responses and Realtime to reduce duplicated logic
  3. Cleaner connection-error handling with reduced log noise for expected disconnect patterns

Result: better maintainability and stable WS behavior while preserving existing HTTP behavior.


Conclusion

If your workload relies on long-lived, low-latency, multi-turn interaction, OpenAI WebSocket mode can be significantly better than rebuilding context on each HTTP request.

XAI Router's goal is straightforward: keep OpenAI semantics intact while adding production-grade control for routing, limits, policy, and accounting.


References