Next.js makes it simple to add voice to your application. In this tutorial, we'll build a text-to-speech feature with streaming audio playback — from API route to React component.

What We're Building

A Next.js app that:

Takes text input from the user
Sends it to murmr's VoiceDesign API via a server-side API route
Streams audio back and plays it progressively in the browser

No audio files to manage, no pre-recorded clips. Just type text, describe a voice, and hear it speak.

Prerequisites

A Next.js 14+ project (App Router)
A murmr API key from the dashboard
Node.js 18+

Step 1: Environment Setup

Add your API key to .env.local:

text

MURMR_API_KEY=your_api_key_here

Never expose your API key

The API key goes in .env.local (no NEXT_PUBLIC_ prefix). It should only be accessible server-side, in API routes and Server Components.

Step 2: Create the API Route

The API route proxies requests to murmr and streams audio back to the client. This keeps your API key server-side and lets you add your own auth, rate limiting, or logging.

app/api/speak/route.ts

import { NextRequest } from 'next/server';

export async function POST(req: NextRequest) {
  const { text, voiceDescription, language } = await req.json();

  const response = await fetch(
    'https://api.murmr.dev/v1/voices/design/stream',
    {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${process.env.MURMR_API_KEY}`,
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({
        text,
        voice_description: voiceDescription,
        language: language || 'en',
      }),
    }
  );

  if (!response.ok) {
    const error = await response.text();
    return new Response(error, { status: response.status });
  }

  // Forward the SSE stream directly to the client
  return new Response(response.body, {
    headers: {
      'Content-Type': 'text/event-stream',
      'Cache-Control': 'no-cache',
      'Connection': 'keep-alive',
    },
  });
}

The streaming variant forwards the SSE stream from murmr directly to the browser — no buffering on the server. The batch variant is simpler: it waits for the complete audio file and returns it as a WAV.

Step 3: Build the React Component

Now the client-side component that captures input and plays audio.

Streaming Playback

For the VoiceDesign streaming endpoint, audio arrives as Server-Sent Events with base64-encoded PCM chunks. We decode and queue them for playback:

components/voice-player.tsx

'use client';

import { useState, useRef, useCallback } from 'react';

export function VoicePlayer() {
  const [text, setText] = useState('');
  const [description, setDescription] = useState(
    'A warm, friendly narrator with clear enunciation'
  );
  const [isPlaying, setIsPlaying] = useState(false);
  const audioContextRef = useRef<AudioContext | null>(null);
  const nextStartTimeRef = useRef(0);

  const speak = useCallback(async () => {
    setIsPlaying(true);

    // Initialize AudioContext on user gesture
    if (!audioContextRef.current) {
      audioContextRef.current = new AudioContext({ sampleRate: 24000 });
    }
    const ctx = audioContextRef.current;
    nextStartTimeRef.current = ctx.currentTime;

    const response = await fetch('/api/speak', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        text,
        voiceDescription: description,
        language: 'en',
      }),
    });

    const reader = response.body!.getReader();
    const decoder = new TextDecoder();
    let buffer = '';

    while (true) {
      const { done, value } = await reader.read();
      if (done) break;

      buffer += decoder.decode(value, { stream: true });
      const lines = buffer.split('\n');
      buffer = lines.pop() || '';

      for (const line of lines) {
        if (!line.startsWith('data: ')) continue;
        const data = JSON.parse(line.slice(6));

        if (data.audio) {
          // Decode base64 PCM and schedule playback
          const pcmBytes = atob(data.audio);
          const samples = new Float32Array(pcmBytes.length / 2);
          for (let i = 0; i < samples.length; i++) {
            const int16 =
              pcmBytes.charCodeAt(i * 2) |
              (pcmBytes.charCodeAt(i * 2 + 1) << 8);
            samples[i] = (int16 > 32767 ? int16 - 65536 : int16) / 32768;
          }

          const audioBuffer = ctx.createBuffer(1, samples.length, 24000);
          audioBuffer.copyToChannel(samples, 0);

          const source = ctx.createBufferSource();
          source.buffer = audioBuffer;
          source.connect(ctx.destination);

          const startTime = Math.max(
            ctx.currentTime,
            nextStartTimeRef.current
          );
          source.start(startTime);
          nextStartTimeRef.current =
            startTime + audioBuffer.duration;
        }
      }
    }

    // Wait for all audio to finish
    const remaining = nextStartTimeRef.current - ctx.currentTime;
    if (remaining > 0) {
      await new Promise((r) => setTimeout(r, remaining * 1000));
    }
    setIsPlaying(false);
  }, [text, description]);

  return (
    <div className="space-y-4">
      <textarea
        value={description}
        onChange={(e) => setDescription(e.target.value)}
        placeholder="Describe the voice..."
        className="w-full p-3 rounded bg-zinc-800 text-zinc-100"
        rows={2}
      />
      <textarea
        value={text}
        onChange={(e) => setText(e.target.value)}
        placeholder="Enter text to speak..."
        className="w-full p-3 rounded bg-zinc-800 text-zinc-100"
        rows={4}
      />
      <button
        onClick={speak}
        disabled={isPlaying || !text}
        className="px-6 py-2 bg-amber-500 text-zinc-900 rounded
                   font-medium disabled:opacity-50"
      >
        {isPlaying ? 'Speaking...' : 'Speak'}
      </button>
    </div>
  );
}

Why schedule with startTime?

Using source.start(startTime) instead of source.start() ensures gapless playback. Each chunk is scheduled to begin exactly when the previous one ends, eliminating clicks and pauses between chunks.

Batch Playback

If you're using saved voices (simpler, no SSE parsing), the batch approach is even easier:

typescript

async function speakWithSavedVoice(text: string, voiceId: string) {
  const response = await fetch('/api/speak', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ text, voiceId }),
  });

  const blob = await response.blob();
  const url = URL.createObjectURL(blob);
  const audio = new Audio(url);
  audio.play();

  audio.onended = () => URL.revokeObjectURL(url);
}

Step 4: Add Input Validation

Validate request bodies in your API route to prevent abuse:

typescript

import { z } from 'zod';

const speakSchema = z.object({
  text: z.string().min(1).max(5000),
  voiceDescription: z.string().min(1).max(500),
  language: z.string().length(2).default('en'),
});

export async function POST(req: NextRequest) {
  const body = speakSchema.parse(await req.json());
  // ... rest of handler
}

Step 5: Deploy to Production

murmr's API handles all the GPU compute — your Next.js app just proxies requests. Deploy normally:

bash

vercel deploy

Make sure MURMR_API_KEY is set in your Vercel project's environment variables.

Production Checklist

API key security: Verify your key is only in server-side env vars (no NEXT_PUBLIC_ prefix)
Rate limiting: Add rate limiting to your API route to prevent abuse. murmr enforces plan limits, but you should also protect your own endpoint
Error handling: Show user-friendly errors when the API is unavailable or quota is exceeded
Audio format: The streaming endpoint returns 24kHz 16-bit mono PCM. The batch endpoint returns WAV by default (configurable via response_format)

Choosing Between Streaming and Batch

| | Streaming (VoiceDesign) | Batch (Saved Voice) | |---|---|---| | Latency | First audio in ~300ms | Full audio after generation completes | | Use case | Interactive, real-time | Pre-generated, downloadable | | Voice | Describe on-the-fly | Consistent saved voice | | Complexity | SSE parsing + audio scheduling | Simple fetch + play | | Plans | All plans (Free tier: 5/day) | All plans |

For most apps, start with streaming VoiceDesign for prototyping, then save voices you like and switch to the batch endpoint for production consistency.

Next Steps

Save voices: Use the Voice Management API to save VoiceDesign voices for reuse
Multiple languages: murmr supports 10 languages — just change the language parameter
Real-time agents: For voice agents that need sub-200ms latency, check out the WebSocket API
Audio formats: Request mp3, opus, aac, or flac via the response_format parameter on batch endpoints

How to Add Voice to Your Next.js App