Next.js makes it simple to add voice to your application. In this tutorial, we'll build a text-to-speech feature with streaming audio playback — from API route to React component.
What We're Building
A Next.js app that:
- Takes text input from the user
- Sends it to murmr's VoiceDesign API via a server-side API route
- Streams audio back and plays it progressively in the browser
No audio files to manage, no pre-recorded clips. Just type text, describe a voice, and hear it speak.
Prerequisites
- A Next.js 14+ project (App Router)
- A murmr API key from the dashboard
- Node.js 18+
Step 1: Environment Setup
Add your API key to .env.local:
MURMR_API_KEY=your_api_key_here
Never expose your API key
The API key goes in .env.local (no NEXT_PUBLIC_ prefix). It should only be accessible server-side, in API routes and Server Components.
Step 2: Create the API Route
The API route proxies requests to murmr and streams audio back to the client. This keeps your API key server-side and lets you add your own auth, rate limiting, or logging.
import { NextRequest } from 'next/server';
export async function POST(req: NextRequest) {
const { text, voiceDescription, language } = await req.json();
const response = await fetch(
'https://api.murmr.dev/v1/voices/design/stream',
{
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.MURMR_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
text,
voice_description: voiceDescription,
language: language || 'en',
}),
}
);
if (!response.ok) {
const error = await response.text();
return new Response(error, { status: response.status });
}
// Forward the SSE stream directly to the client
return new Response(response.body, {
headers: {
'Content-Type': 'text/event-stream',
'Cache-Control': 'no-cache',
'Connection': 'keep-alive',
},
});
}The streaming variant forwards the SSE stream from murmr directly to the browser — no buffering on the server. The batch variant is simpler: it waits for the complete audio file and returns it as a WAV.
Step 3: Build the React Component
Now the client-side component that captures input and plays audio.
Streaming Playback
For the VoiceDesign streaming endpoint, audio arrives as Server-Sent Events with base64-encoded PCM chunks. We decode and queue them for playback:
'use client';
import { useState, useRef, useCallback } from 'react';
export function VoicePlayer() {
const [text, setText] = useState('');
const [description, setDescription] = useState(
'A warm, friendly narrator with clear enunciation'
);
const [isPlaying, setIsPlaying] = useState(false);
const audioContextRef = useRef<AudioContext | null>(null);
const nextStartTimeRef = useRef(0);
const speak = useCallback(async () => {
setIsPlaying(true);
// Initialize AudioContext on user gesture
if (!audioContextRef.current) {
audioContextRef.current = new AudioContext({ sampleRate: 24000 });
}
const ctx = audioContextRef.current;
nextStartTimeRef.current = ctx.currentTime;
const response = await fetch('/api/speak', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
text,
voiceDescription: description,
language: 'en',
}),
});
const reader = response.body!.getReader();
const decoder = new TextDecoder();
let buffer = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split('\n');
buffer = lines.pop() || '';
for (const line of lines) {
if (!line.startsWith('data: ')) continue;
const data = JSON.parse(line.slice(6));
if (data.audio) {
// Decode base64 PCM and schedule playback
const pcmBytes = atob(data.audio);
const samples = new Float32Array(pcmBytes.length / 2);
for (let i = 0; i < samples.length; i++) {
const int16 =
pcmBytes.charCodeAt(i * 2) |
(pcmBytes.charCodeAt(i * 2 + 1) << 8);
samples[i] = (int16 > 32767 ? int16 - 65536 : int16) / 32768;
}
const audioBuffer = ctx.createBuffer(1, samples.length, 24000);
audioBuffer.copyToChannel(samples, 0);
const source = ctx.createBufferSource();
source.buffer = audioBuffer;
source.connect(ctx.destination);
const startTime = Math.max(
ctx.currentTime,
nextStartTimeRef.current
);
source.start(startTime);
nextStartTimeRef.current =
startTime + audioBuffer.duration;
}
}
}
// Wait for all audio to finish
const remaining = nextStartTimeRef.current - ctx.currentTime;
if (remaining > 0) {
await new Promise((r) => setTimeout(r, remaining * 1000));
}
setIsPlaying(false);
}, [text, description]);
return (
<div className="space-y-4">
<textarea
value={description}
onChange={(e) => setDescription(e.target.value)}
placeholder="Describe the voice..."
className="w-full p-3 rounded bg-zinc-800 text-zinc-100"
rows={2}
/>
<textarea
value={text}
onChange={(e) => setText(e.target.value)}
placeholder="Enter text to speak..."
className="w-full p-3 rounded bg-zinc-800 text-zinc-100"
rows={4}
/>
<button
onClick={speak}
disabled={isPlaying || !text}
className="px-6 py-2 bg-amber-500 text-zinc-900 rounded
font-medium disabled:opacity-50"
>
{isPlaying ? 'Speaking...' : 'Speak'}
</button>
</div>
);
}Why schedule with startTime?
Using source.start(startTime) instead of source.start() ensures gapless playback. Each chunk is scheduled to begin exactly when the previous one ends, eliminating clicks and pauses between chunks.
Batch Playback
If you're using saved voices (simpler, no SSE parsing), the batch approach is even easier:
async function speakWithSavedVoice(text: string, voiceId: string) {
const response = await fetch('/api/speak', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ text, voiceId }),
});
const blob = await response.blob();
const url = URL.createObjectURL(blob);
const audio = new Audio(url);
audio.play();
audio.onended = () => URL.revokeObjectURL(url);
}
Step 4: Add Input Validation
Validate request bodies in your API route to prevent abuse:
import { z } from 'zod';
const speakSchema = z.object({
text: z.string().min(1).max(5000),
voiceDescription: z.string().min(1).max(500),
language: z.string().length(2).default('en'),
});
export async function POST(req: NextRequest) {
const body = speakSchema.parse(await req.json());
// ... rest of handler
}
Step 5: Deploy to Production
murmr's API handles all the GPU compute — your Next.js app just proxies requests. Deploy normally:
vercel deploy
Make sure MURMR_API_KEY is set in your Vercel project's environment variables.
Production Checklist
- API key security: Verify your key is only in server-side env vars (no
NEXT_PUBLIC_prefix) - Rate limiting: Add rate limiting to your API route to prevent abuse. murmr enforces plan limits, but you should also protect your own endpoint
- Error handling: Show user-friendly errors when the API is unavailable or quota is exceeded
- Audio format: The streaming endpoint returns 24kHz 16-bit mono PCM. The batch endpoint returns WAV by default (configurable via
response_format)
Choosing Between Streaming and Batch
| | Streaming (VoiceDesign) | Batch (Saved Voice) | |---|---|---| | Latency | First audio in ~300ms | Full audio after generation completes | | Use case | Interactive, real-time | Pre-generated, downloadable | | Voice | Describe on-the-fly | Consistent saved voice | | Complexity | SSE parsing + audio scheduling | Simple fetch + play | | Plans | All plans (Free tier: 5/day) | All plans |
For most apps, start with streaming VoiceDesign for prototyping, then save voices you like and switch to the batch endpoint for production consistency.
Next Steps
- Save voices: Use the Voice Management API to save VoiceDesign voices for reuse
- Multiple languages: murmr supports 10 languages — just change the
languageparameter - Real-time agents: For voice agents that need sub-200ms latency, check out the WebSocket API
- Audio formats: Request
mp3,opus,aac, orflacvia theresponse_formatparameter on batch endpoints