Most TTS APIs force you to choose from preset voices or upload audio samples for cloning. VoiceDesign takes a different approach: describe the voice you want in natural language, and murmr creates it from scratch.

What is VoiceDesign?

VoiceDesign is murmr's unique approach to voice creation. Instead of:

Choosing from limited preset voices
Recording your own voice samples
Hiring voice actors and uploading recordings

You simply describe what you want:

text

"A warm, professional woman in her 30s with a slight
French accent. Speaks calmly and clearly, like a
meditation app instructor."

VoiceDesign generates audio that matches your description—no samples needed, no voice actor required, and infinite variations possible.

How It Works

Under the hood, VoiceDesign uses the Qwen3-TTS model's ability to condition speech generation on text descriptions. The model was trained on millions of hours of speech with corresponding descriptions, learning the relationship between descriptive text and vocal characteristics.

When you send a request, murmr:

Processes your voice description to extract vocal characteristics
Generates speech conditioned on both the description and your input text
Returns audio that matches the described voice

VoiceDesign API Request

curl -X POST https://api.murmr.dev/v1/voices/design \
  -H "Authorization: Bearer $MURMR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Welcome to our application. I am here to help you.",
    "voice_description": "A warm, professional narrator with a calm pace",
    "language": "en"
  }'

Writing Effective Descriptions

The quality of your voice depends heavily on how you describe it. Here's what to include:

Demographics

Age, gender, and origin help set the baseline:

"A young woman in her early 20s"
"An older gentleman, perhaps 60s"
"A native German speaker"

Tone and Energy

How does the voice feel emotionally?

"Warm and friendly"
"Professional and authoritative"
"Energetic and enthusiastic"
"Calm and soothing"

Pace and Delivery

Speaking style affects comprehension and mood:

"Speaks slowly and deliberately"
"Quick-paced and dynamic"
"Measured pace with clear enunciation"

Accent and Language

Be specific about accents—use nationality rather than regional dialects:

"British accent" or "American accent"
"Native German speaker"
"Bavarian accent" (regional dialects less reliable)

Pro tip

Descriptions work in any of murmr's 10 supported languages. You can describe a Japanese voice in Japanese, or describe a German voice in English—the model understands both.

Example Descriptions

Here are proven descriptions for common use cases:

Corporate Narrator

text

"A professional male narrator in his 40s with a clear American accent.
Speaks with authority and confidence, at a measured pace suitable for
corporate training videos. Warm but businesslike."

Meditation Guide

text

"A calm, soothing female voice. Speaks very slowly with gentle pauses.
Soft-spoken and peaceful, like a yoga instructor guiding relaxation."

Audiobook Narrator

text

"A warm British narrator with a rich, expressive voice. Varies pace
and emotion naturally, perfect for storytelling. Male, middle-aged,
with excellent diction."

Customer Service Agent

text

"A friendly, helpful young woman with a neutral American accent.
Professional but approachable. Speaks clearly at a natural
conversational pace."

German News Anchor

text

"Ein professioneller Nachrichtensprecher mit klarer Aussprache und
neutralem Hochdeutsch. Sachlich und vertrauenswürdig, männlich,
mittleren Alters."

Saving Voices for Reuse

Each VoiceDesign request generates a slightly different voice (even with the same description). For consistency across your application, save voices you like:

Saving a voice is a two-step process:

Generate audio with VoiceDesign — the streaming response includes the voice data
Save the voice via the Voice Management API — stores the voice for reuse

Once saved, use the standard /v1/audio/speech endpoint with the voice ID for consistent results:

Using a saved voice

# Use a saved voice (OpenAI-compatible endpoint)
curl -X POST https://api.murmr.dev/v1/audio/speech \
  -H "Authorization: Bearer $MURMR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Any text here",
    "voice": "voice_abc123"
  }' \
  --output speech.wav

Saved voices use the OpenAI-compatible endpoint instead of VoiceDesign. They're faster and produce consistent output every time. See the Voice Management docs for the full save workflow.

| Plan | Saved Voice Slots | |------|-------------------| | Free | 3 | | Starter | 10 | | Pro | 25 | | Scale | 100 |

Best Practices

Summary

Be specific but concise. Include demographics, tone, pace, and accent. Test with your actual content, not just "hello world."

1. Test with real content

The same voice can sound different with different text. Test with samples of your actual content—if you're building a meditation app, test with meditation scripts.

2. Iterate on descriptions

VoiceDesign is not deterministic. Run the same description multiple times and save the best result. Think of it like casting auditions.

3. Use English descriptions for non-English output

Surprisingly, English descriptions often work better than native-language descriptions, even for non-English voices. The model's training data was heavily English-annotated.

4. Specify nationality for accents

"Native German speaker" works better than "German accent" for authentic results. Regional dialects (Bavarian, Berliner) rarely work—stick to standard accents.

Current Limitations

VoiceDesign is powerful but not magic. Here's what to expect:

Not deterministic: Same description produces similar but not identical voices
Regional dialects: Standard accents work well; regional dialects don't
Celebrity voices: The model won't recreate specific real people
Extreme characteristics: Very unusual voices (robot, alien) have mixed results

Info

Voice cloning from audio samples is coming soon. Until then, VoiceDesign is the way to create custom voices without recording.

Try it now in the Voice Playground. Design voices interactively, then save the ones you like.

VoiceDesign Mastery: Creating the Perfect Voice

What is VoiceDesign?

How It Works

Writing Effective Descriptions

Demographics

Tone and Energy

Pace and Delivery

Accent and Language

Example Descriptions

Corporate Narrator

Meditation Guide

Audiobook Narrator

Customer Service Agent

German News Anchor

Saving Voices for Reuse

Best Practices

1. Test with real content

2. Iterate on descriptions

3. Use English descriptions for non-English output

4. Specify nationality for accents

Current Limitations

murmr Team

Related Posts

How to Add Voice to Your Next.js App

Building Real-time Voice Agents with WebSocket Streaming