Real-time/WebSocket Protocol

WebSocket Protocol

Full protocol reference for the real-time WebSocket endpoint. Covers authentication, message types, binary mode, text buffering, and close codes.

Connection

wss://api.murmr.dev/v1/realtime

After connecting, send a config message with your API key within 10 seconds. The server responds with config_ack immediately — you can start sending text right away while auth completes in the background.

Parallel Authentication

The server sends config_ack before auth completes, saving ~200ms. Text sent during auth is queued and processed once the API key is validated. If auth fails, queued text is discarded and the connection closes with code 4002.

Plan Requirement

WebSocket is available on Realtime and Scale plans only. Other plans receive close code 4002 with a message indicating the required plan.

Client → Server Messages

config

First message after connecting. Authenticates and configures the voice.

ParameterTypeDescription
typerequired"config"Message type
api_keyrequiredstringYour murmr API key (murmr_sk_live_... or murmr_sk_test_...)
voice_descriptionstringVoiceDesign description (e.g., "A warm, friendly voice"). Use this OR voice/voice_clone_prompt.
voicestringSaved voice ID (e.g., "voice_abc123"). Requires voice_clone_prompt.
voice_clone_promptstringBase64-encoded voice prompt data from saved voice. Alternative to voice_description.
languagestringFull language name: English, Spanish, Portuguese, German, French, Italian, Chinese, Japanese, Korean, Russian, or "Auto"
JSON
// VoiceDesign mode
{
  "type": "config",
  "api_key": "murmr_sk_live_xxx",
  "voice_description": "A warm, professional narrator, calm and measured",
  "language": "English"
}

// Saved voice mode
{
  "type": "config",
  "api_key": "murmr_sk_live_xxx",
  "voice_clone_prompt": "BASE64_PROMPT_DATA...",
  "language": "English"
}

text

Send text to synthesize. Text is buffered server-side and generated at natural boundaries.

ParameterTypeDescription
typerequired"text"Message type
textrequiredstringText to synthesize. Can be a single token or full sentence.
JSON
{"type": "text", "text": "Hello, "}

flush

Force immediate generation of all buffered text. Send this after the last text message to generate any remaining content.

JSON
{"type": "flush"}

binary_mode

Opt into raw PCM binary frames instead of base64 JSON. Saves ~50-100ms per chunk. Send after receiving config_ack.

JSON
{"type": "binary_mode"}

ping

Application-level keepalive. Server responds with pong.

JSON
{"type": "ping"}

Server → Client Messages

config_ack

Sent immediately after receiving config. Signals that the connection is accepted and text can be sent. Auth continues in the background.

JSON
{
  "type": "config_ack",
  "session_id": "a1b2c3d4"
}

binary_mode_ack

Confirms binary mode is enabled. Subsequent audio arrives as raw binary WebSocket frames.

JSON
{
  "type": "binary_mode_ack",
  "sample_rate": 24000,
  "format": "pcm_s16le"
}

audio(JSON mode)

Audio chunk with base64-encoded PCM. In binary mode, audio arrives as raw binary frames instead (no JSON wrapper).

JSON
{
  "type": "audio",
  "chunk": "SGVsbG8gV29ybGQh...",
  "sample_rate": 24000,
  "format": "pcm_s16le"
}

binary frame(binary mode)

Raw PCM bytes as a binary WebSocket frame. No JSON parsing needed — the entire frame payload is audio data (24kHz, 16-bit, mono, little-endian).

done

Sent when all audio for the current generation has been delivered. Always sent as a JSON text frame, even in binary mode.

JSON
{
  "type": "done",
  "total_chunks": 5,
  "duration_ms": 2500,
  "first_chunk_latency_ms": 460
}

error

Sent when an error occurs. Non-fatal errors (rate limit on a single generation) keep the connection open. Fatal errors close the connection.

JSON
{
  "type": "error",
  "message": "All slots occupied, try again shortly",
  "code": 4006
}

pong

Response to a ping message.

JSON
{"type": "pong"}

Text Buffering

Text is accumulated server-side and flushed at natural boundaries for better prosody. This is critical for LLM integration where tokens arrive one at a time.

RuleConditionBehavior
Sentence boundaryBuffer >= 50 chars + sentence end (.!? + space)Flush up to boundary
Clause boundaryBuffer >= 50 chars + clause end (,;: + space)Flush up to boundary
Force flushBuffer >= 200 charsFlush entire buffer (or at best boundary)
Explicit flushClient sends {"type":"flush"}Flush immediately, any size
Buffer limitBuffer would exceed 4096 charsError — buffer overflow

Note

Multiple generations per session: A single WebSocket connection supports multiple text→audio cycles. Send text, receive audio + done, then send more text. The voice configuration persists for the entire session.

Binary Mode

Binary mode eliminates base64 encoding overhead for lower latency. Audio arrives as raw binary WebSocket frames; control messages (done, error, pong) remain as JSON text frames.

JSON
// 1. Connect and configure
→ {"type": "config", "api_key": "...", "voice_description": "..."}
← {"type": "config_ack", "session_id": "a1b2c3d4"}

// 2. Enable binary mode
→ {"type": "binary_mode"}
← {"type": "binary_mode_ack", "sample_rate": 24000, "format": "pcm_s16le"}

// 3. Send text
→ {"type": "text", "text": "Hello, world!"}
→ {"type": "flush"}

// 4. Receive audio
← [binary frame: raw PCM bytes]
← [binary frame: raw PCM bytes]
← {"type": "done", "total_chunks": 2, "duration_ms": 1200, "first_chunk_latency_ms": 460}

Detecting frame type

In binary mode, check the WebSocket frame type to distinguish audio from control messages. Binary frames contain PCM audio. Text frames contain JSON (done, error, pong).

Close Codes

CodeNameDescription
1001GOING_AWAYServer shutting down gracefully
4001AUTH_TIMEOUTNo config message received within 10 seconds
4002AUTH_FAILEDInvalid API key or plan does not include WebSocket
4003INVALID_MESSAGEMalformed JSON or unexpected message type
4004RATE_LIMITEDToo many concurrent connections or generations
4005SERVER_ERRORInternal server error

Rate Limits

LimitDefaultDescription
Concurrent connections10 per API keyMaximum open WebSocket connections
Generations per minute100 per API keySliding window, resets continuously
Global connections500 totalServer-wide connection cap

Character usage on WebSocket counts against your plan's monthly character quota, same as HTTP endpoints.

Testing with wscat

bash
# Install
npm install -g wscat

# Connect
wscat -c wss://api.murmr.dev/v1/realtime

# Authenticate (paste and press Enter)
{"type":"config","api_key":"murmr_sk_live_xxx","voice_description":"A warm narrator","language":"English"}

# Wait for config_ack, then send text
{"type":"text","text":"Hello, world! This is a test of the WebSocket protocol."}
{"type":"flush"}

# You'll receive audio chunks (base64 JSON), then a done event

See Also