Back to Blog
Tutorials

VoiceDesign Mastery: Creating the Perfect Voice

Learn how to use murmr's VoiceDesign to create any voice from text descriptions. Includes best practices, examples, and tips for consistent voices.

mT
murmr Team
30 de janeiro de 20266 min read
#voicedesign#voices#tutorial#getting-started#custom-voices

Most TTS APIs force you to choose from preset voices or upload audio samples for cloning. VoiceDesign takes a different approach: describe the voice you want in natural language, and murmr creates it from scratch.

What is VoiceDesign?

VoiceDesign is murmr's unique approach to voice creation. Instead of:

  • Choosing from limited preset voices
  • Recording your own voice samples
  • Hiring voice actors and uploading recordings

You simply describe what you want:

text
"A warm, professional woman in her 30s with a slight
French accent. Speaks calmly and clearly, like a
meditation app instructor."

VoiceDesign generates audio that matches your description—no samples needed, no voice actor required, and infinite variations possible.

How It Works

Under the hood, VoiceDesign uses the Qwen3-TTS model's ability to condition speech generation on text descriptions. The model was trained on millions of hours of speech with corresponding descriptions, learning the relationship between descriptive text and vocal characteristics.

When you send a request, murmr:

  1. Processes your voice description to extract vocal characteristics
  2. Generates speech conditioned on both the description and your input text
  3. Returns audio that matches the described voice
VoiceDesign API Request
curl -X POST https://api.murmr.dev/v1/voices/design \
  -H "Authorization: Bearer $MURMR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Welcome to our application. I am here to help you.",
    "voice_description": "A warm, professional narrator with a calm pace",
    "language": "en"
  }'

Writing Effective Descriptions

The quality of your voice depends heavily on how you describe it. Here's what to include:

Demographics

Age, gender, and origin help set the baseline:

  • "A young woman in her early 20s"
  • "An older gentleman, perhaps 60s"
  • "A native German speaker"

Tone and Energy

How does the voice feel emotionally?

  • "Warm and friendly"
  • "Professional and authoritative"
  • "Energetic and enthusiastic"
  • "Calm and soothing"

Pace and Delivery

Speaking style affects comprehension and mood:

  • "Speaks slowly and deliberately"
  • "Quick-paced and dynamic"
  • "Measured pace with clear enunciation"

Accent and Language

Be specific about accents—use nationality rather than regional dialects:

  • "British accent" or "American accent"
  • "Native German speaker"
  • "Bavarian accent" (regional dialects less reliable)

Pro tip

Descriptions work in any of murmr's 10 supported languages. You can describe a Japanese voice in Japanese, or describe a German voice in English—the model understands both.

Example Descriptions

Here are proven descriptions for common use cases:

Corporate Narrator

text
"A professional male narrator in his 40s with a clear American accent.
Speaks with authority and confidence, at a measured pace suitable for
corporate training videos. Warm but businesslike."

Meditation Guide

text
"A calm, soothing female voice. Speaks very slowly with gentle pauses.
Soft-spoken and peaceful, like a yoga instructor guiding relaxation."

Audiobook Narrator

text
"A warm British narrator with a rich, expressive voice. Varies pace
and emotion naturally, perfect for storytelling. Male, middle-aged,
with excellent diction."

Customer Service Agent

text
"A friendly, helpful young woman with a neutral American accent.
Professional but approachable. Speaks clearly at a natural
conversational pace."

German News Anchor

text
"Ein professioneller Nachrichtensprecher mit klarer Aussprache und
neutralem Hochdeutsch. Sachlich und vertrauenswürdig, männlich,
mittleren Alters."

Saving Voices for Reuse

Each VoiceDesign request generates a slightly different voice (even with the same description). For consistency across your application, save voices you like:

Saving a voice is a two-step process:

  1. Generate audio with VoiceDesign — the streaming response includes the voice data
  2. Save the voice via the Voice Management API — stores the voice for reuse

Once saved, use the standard /v1/audio/speech endpoint with the voice ID for consistent results:

Using a saved voice
# Use a saved voice (OpenAI-compatible endpoint)
curl -X POST https://api.murmr.dev/v1/audio/speech \
  -H "Authorization: Bearer $MURMR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Any text here",
    "voice": "voice_abc123"
  }' \
  --output speech.wav

Saved voices use the OpenAI-compatible endpoint instead of VoiceDesign. They're faster and produce consistent output every time. See the Voice Management docs for the full save workflow.

| Plan | Saved Voice Slots | |------|-------------------| | Free | 3 | | Starter | 10 | | Pro | 25 | | Scale | 100 |

Best Practices

Summary

Be specific but concise. Include demographics, tone, pace, and accent. Test with your actual content, not just "hello world."

1. Test with real content

The same voice can sound different with different text. Test with samples of your actual content—if you're building a meditation app, test with meditation scripts.

2. Iterate on descriptions

VoiceDesign is not deterministic. Run the same description multiple times and save the best result. Think of it like casting auditions.

3. Use English descriptions for non-English output

Surprisingly, English descriptions often work better than native-language descriptions, even for non-English voices. The model's training data was heavily English-annotated.

4. Specify nationality for accents

"Native German speaker" works better than "German accent" for authentic results. Regional dialects (Bavarian, Berliner) rarely work—stick to standard accents.

Current Limitations

VoiceDesign is powerful but not magic. Here's what to expect:

  • Not deterministic: Same description produces similar but not identical voices
  • Regional dialects: Standard accents work well; regional dialects don't
  • Celebrity voices: The model won't recreate specific real people
  • Extreme characteristics: Very unusual voices (robot, alien) have mixed results

Info

Voice cloning from audio samples is coming soon. Until then, VoiceDesign is the way to create custom voices without recording.

Try it now in the Voice Playground. Design voices interactively, then save the ones you like.

mT

murmr Team

Engineering

Building the next generation of multilingual text-to-speech.

Related Posts