Most TTS APIs force you to choose from preset voices or upload audio samples for cloning. VoiceDesign takes a different approach: describe the voice you want in natural language, and murmr creates it from scratch.
What is VoiceDesign?
VoiceDesign is murmr's unique approach to voice creation. Instead of:
- Choosing from limited preset voices
- Recording your own voice samples
- Hiring voice actors and uploading recordings
You simply describe what you want:
"A warm, professional woman in her 30s with a slight
French accent. Speaks calmly and clearly, like a
meditation app instructor."
VoiceDesign generates audio that matches your description—no samples needed, no voice actor required, and infinite variations possible.
How It Works
Under the hood, VoiceDesign uses the Qwen3-TTS model's ability to condition speech generation on text descriptions. The model was trained on millions of hours of speech with corresponding descriptions, learning the relationship between descriptive text and vocal characteristics.
When you send a request, murmr:
- Processes your voice description to extract vocal characteristics
- Generates speech conditioned on both the description and your input text
- Returns audio that matches the described voice
curl -X POST https://api.murmr.dev/v1/voices/design \
-H "Authorization: Bearer $MURMR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"text": "Welcome to our application. I am here to help you.",
"voice_description": "A warm, professional narrator with a calm pace",
"language": "en"
}'Writing Effective Descriptions
The quality of your voice depends heavily on how you describe it. Here's what to include:
Demographics
Age, gender, and origin help set the baseline:
- "A young woman in her early 20s"
- "An older gentleman, perhaps 60s"
- "A native German speaker"
Tone and Energy
How does the voice feel emotionally?
- "Warm and friendly"
- "Professional and authoritative"
- "Energetic and enthusiastic"
- "Calm and soothing"
Pace and Delivery
Speaking style affects comprehension and mood:
- "Speaks slowly and deliberately"
- "Quick-paced and dynamic"
- "Measured pace with clear enunciation"
Accent and Language
Be specific about accents—use nationality rather than regional dialects:
- "British accent" or "American accent"
- "Native German speaker"
- "Bavarian accent" (regional dialects less reliable)
Pro tip
Descriptions work in any of murmr's 10 supported languages. You can describe a Japanese voice in Japanese, or describe a German voice in English—the model understands both.
Example Descriptions
Here are proven descriptions for common use cases:
Corporate Narrator
"A professional male narrator in his 40s with a clear American accent.
Speaks with authority and confidence, at a measured pace suitable for
corporate training videos. Warm but businesslike."
Meditation Guide
"A calm, soothing female voice. Speaks very slowly with gentle pauses.
Soft-spoken and peaceful, like a yoga instructor guiding relaxation."
Audiobook Narrator
"A warm British narrator with a rich, expressive voice. Varies pace
and emotion naturally, perfect for storytelling. Male, middle-aged,
with excellent diction."
Customer Service Agent
"A friendly, helpful young woman with a neutral American accent.
Professional but approachable. Speaks clearly at a natural
conversational pace."
German News Anchor
"Ein professioneller Nachrichtensprecher mit klarer Aussprache und
neutralem Hochdeutsch. Sachlich und vertrauenswürdig, männlich,
mittleren Alters."
Saving Voices for Reuse
Each VoiceDesign request generates a slightly different voice (even with the same description). For consistency across your application, save voices you like:
Saving a voice is a two-step process:
- Generate audio with VoiceDesign — the streaming response includes the voice data
- Save the voice via the Voice Management API — stores the voice for reuse
Once saved, use the standard /v1/audio/speech endpoint with the voice ID for consistent results:
# Use a saved voice (OpenAI-compatible endpoint)
curl -X POST https://api.murmr.dev/v1/audio/speech \
-H "Authorization: Bearer $MURMR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"input": "Any text here",
"voice": "voice_abc123"
}' \
--output speech.wavSaved voices use the OpenAI-compatible endpoint instead of VoiceDesign. They're faster and produce consistent output every time. See the Voice Management docs for the full save workflow.
| Plan | Saved Voice Slots | |------|-------------------| | Free | 3 | | Starter | 10 | | Pro | 25 | | Scale | 100 |
Best Practices
Summary
Be specific but concise. Include demographics, tone, pace, and accent. Test with your actual content, not just "hello world."
1. Test with real content
The same voice can sound different with different text. Test with samples of your actual content—if you're building a meditation app, test with meditation scripts.
2. Iterate on descriptions
VoiceDesign is not deterministic. Run the same description multiple times and save the best result. Think of it like casting auditions.
3. Use English descriptions for non-English output
Surprisingly, English descriptions often work better than native-language descriptions, even for non-English voices. The model's training data was heavily English-annotated.
4. Specify nationality for accents
"Native German speaker" works better than "German accent" for authentic results. Regional dialects (Bavarian, Berliner) rarely work—stick to standard accents.
Current Limitations
VoiceDesign is powerful but not magic. Here's what to expect:
- Not deterministic: Same description produces similar but not identical voices
- Regional dialects: Standard accents work well; regional dialects don't
- Celebrity voices: The model won't recreate specific real people
- Extreme characteristics: Very unusual voices (robot, alien) have mixed results
Info
Voice cloning from audio samples is coming soon. Until then, VoiceDesign is the way to create custom voices without recording.
Try it now in the Voice Playground. Design voices interactively, then save the ones you like.