Back to Blog
Languages

Text-to-Speech API for 10 Languages: A Developer Guide

Build multilingual voice apps with murmr's TTS API. Native-quality speech in 10 languages including Japanese, Chinese, Korean, and European languages.

mT
murmr Team
17. Februar 20269 min read
#languages#multilingual#voicedesign#guide#i18n

Most TTS APIs treat non-English languages as an afterthought. The voice sounds robotic, intonation is wrong, and you can hear the English accent bleeding through. murmr takes a different approach: fewer languages, but each one sounds like a native speaker.

Supported Languages

murmr supports 10 languages, each with full VoiceDesign support:

| Language | Code | VoiceDesign | Streaming | Batch | |----------|------|-------------|-----------|-------| | Chinese (Mandarin) | zh | Yes | Yes | Yes | | English | en | Yes | Yes | Yes | | Japanese | ja | Yes | Yes | Yes | | Korean | ko | Yes | Yes | Yes | | German | de | Yes | Yes | Yes | | French | fr | Yes | Yes | Yes | | Russian | ru | Yes | Yes | Yes | | Portuguese | pt | Yes | Yes | Yes | | Spanish | es | Yes | Yes | Yes | | Italian | it | Yes | Yes | Yes |

Every language supports the full API: VoiceDesign (describe any voice), SSE streaming, batch generation, WebSocket realtime, and saved voices.

Quick Start

Generating speech in any language is a one-parameter change:

Generate speech in any language
# Japanese
curl -X POST https://api.murmr.dev/v1/voices/design \
  -H "Authorization: Bearer $MURMR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "こんにちは、お元気ですか?",
    "voice_description": "A young Japanese woman with a warm, polite tone",
    "language": "ja"
  }' --output speech-ja.wav

# German
curl -X POST https://api.murmr.dev/v1/voices/design \
  -H "Authorization: Bearer $MURMR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Willkommen bei unserer Anwendung.",
    "voice_description": "A professional German man, clear and authoritative",
    "language": "de"
  }' --output speech-de.wav

Voice Descriptions by Language

VoiceDesign understands voice descriptions in any of the 10 supported languages. You can describe a voice in its native language for the most natural results, or use English descriptions for any language.

Describing Voices in Native Languages

Native-language voice descriptions
// Chinese — describe the voice in Chinese
{
  "text": "欢迎使用我们的应用程序。",
  "voice_description": "一位温暖的中年女性播音员,语速适中,普通话标准",
  "language": "zh"
}

// Japanese — describe the voice in Japanese
{
  "text": "本日はご利用いただきありがとうございます。",
  "voice_description": "落ち着いた声の若い女性アナウンサー。丁寧で聞きやすい話し方",
  "language": "ja"
}

// Korean — describe the voice in Korean
{
  "text": "안녕하세요, 무엇을 도와드릴까요?",
  "voice_description": "친절하고 전문적인 젊은 여성 상담원",
  "language": "ko"
}

// German — describe the voice in German
{
  "text": "Herzlich willkommen zu unserem Service.",
  "voice_description": "Ein professioneller Nachrichtensprecher mit klarer Aussprache",
  "language": "de"
}

// French — describe the voice in French
{
  "text": "Bienvenue sur notre plateforme.",
  "voice_description": "Une narratrice chaleureuse avec une voix claire et posée",
  "language": "fr"
}

English descriptions often work well

You don't have to write descriptions in the target language. "A warm Japanese woman in her 30s" works just as well as the Japanese equivalent. The model's training data was heavily English-annotated, so English descriptions are reliable for all 10 languages.

Proven Descriptions for Each Language

Here are voice descriptions that produce consistently good results:

| Language | Example Description | |----------|-------------------| | Chinese | "A professional female news anchor with standard Mandarin pronunciation" | | English | "A warm, clear narrator with a neutral American accent" | | Japanese | "A polite young woman with a gentle, clear speaking style" | | Korean | "A friendly, professional male announcer" | | German | "A confident male speaker with clear Hochdeutsch" | | French | "A warm female narrator with a Parisian accent" | | Russian | "A deep-voiced male narrator with clear diction" | | Portuguese | "A friendly Brazilian woman with a warm tone" | | Spanish | "A professional male narrator with a neutral Latin American accent" | | Italian | "A warm, expressive Italian woman in her 30s" |

Building Multilingual Apps

Dynamic Language Selection

For apps that serve multiple languages, select the TTS language based on user locale:

typescript
const LANGUAGE_MAP: Record<string, string> = {
  'zh-CN': 'zh',
  'zh-TW': 'zh',
  'ja-JP': 'ja',
  'ko-KR': 'ko',
  'de-DE': 'de',
  'fr-FR': 'fr',
  'ru-RU': 'ru',
  'pt-BR': 'pt',
  'pt-PT': 'pt',
  'es-ES': 'es',
  'es-MX': 'es',
  'it-IT': 'it',
  'en-US': 'en',
  'en-GB': 'en',
};

function getLanguageCode(locale: string): string {
  return LANGUAGE_MAP[locale] || 'en';
}

One Voice Per Language

For consistent branding, save a VoiceDesign voice for each language your app supports. Saved voices produce identical output every time:

typescript
// Map locales to saved voice IDs
const VOICE_IDS: Record<string, string> = {
  en: 'voice_abc123',
  ja: 'voice_def456',
  de: 'voice_ghi789',
  fr: 'voice_jkl012',
  es: 'voice_mno345',
};

async function speak(text: string, locale: string) {
  const language = getLanguageCode(locale);
  const voiceId = VOICE_IDS[language];

  const response = await fetch('https://api.murmr.dev/v1/audio/speech', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.MURMR_API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      input: text,
      voice: voiceId,
    }),
  });

  return response.arrayBuffer();
}

Saved voice limits

The number of voices you can save depends on your plan: Free (3), Starter (10), Pro (25), Realtime (25), Scale (100). With 10 languages, the Starter plan covers one voice per language.

Language-Specific Tips

Chinese (Mandarin)

Mandarin is a tonal language, and murmr handles tones correctly from text. No pinyin annotation needed — just send standard simplified or traditional Chinese characters.

json
{
  "text": "今天天气真好,我们去公园散步吧。",
  "voice_description": "A cheerful young Chinese woman",
  "language": "zh"
}

Japanese

Japanese text can mix kanji, hiragana, and katakana. The model handles all three scripts natively. For best results with names or unusual kanji readings, add furigana in parentheses.

json
{
  "text": "東京タワーは日本で最も有名な観光地の一つです。",
  "voice_description": "A calm male narrator for a travel guide",
  "language": "ja"
}

Korean

Korean's agglutinative grammar produces long compound words. The model handles these naturally, including proper spacing and intonation for formal (합니다체) and informal (해요체) speech levels.

json
{
  "text": "오늘 회의에서 논의된 내용을 정리해 드리겠습니다.",
  "voice_description": "A professional female business presenter",
  "language": "ko"
}

European Languages (German, French, Spanish, Portuguese, Italian, Russian)

European languages work reliably with standard voice descriptions. A few tips:

  • German: Use "Hochdeutsch" for standard German pronunciation. Regional dialects (Bavarian, Swiss) are not reliably reproduced.
  • French: Specify "Parisian" for metropolitan French. Canadian French is less consistent.
  • Spanish: Both Castilian and Latin American accents work. Specify in the description.
  • Portuguese: Specify "Brazilian" or "European" — they sound quite different.
  • Russian: The model handles Cyrillic text natively. No transliteration needed.

Audio Formats

All languages support the same output formats:

| Format | Endpoint | Use Case | |--------|----------|----------| | PCM (24kHz, 16-bit, mono) | Streaming | Real-time playback | | WAV | Batch (default) | Download, processing | | MP3 | Batch | Web distribution | | Opus | Batch | Low bandwidth | | AAC | Batch | Mobile apps | | FLAC | Batch | Archival quality |

Set the format via the response_format parameter on batch endpoints:

bash
curl -X POST https://api.murmr.dev/v1/audio/speech \
  -H "Authorization: Bearer $MURMR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Bonjour le monde",
    "voice": "voice_abc123",
    "response_format": "mp3"
  }' --output speech.mp3

Latency by Language

All 10 languages run on the same Qwen3-TTS model with identical inference performance. There's no latency penalty for non-English languages — a Japanese request generates as fast as an English one.

Typical latency (streaming, first audio chunk):

  • SSE streaming: ~300ms time-to-first-chunk
  • WebSocket: ~200ms time-to-first-chunk
  • Batch: ~1-3s for complete audio (varies by text length)

What's Not Supported (Yet)

  • Mixed-language text: Switching between languages mid-sentence produces inconsistent results. Split into separate requests per language instead.
  • Regional dialects: Standard accents work well, but regional variants (Bavarian German, Kansai Japanese, Cantonese) are unreliable.
  • Code-switching: Common in multilingual communities (e.g., Spanglish), but not well supported. Use the dominant language.

Get Started

  1. Sign up at murmr.dev — the free plan includes 10,000 characters/month
  2. Try voices in the Voice Playground — test all 10 languages interactively
  3. Check the API reference for the full parameter spec
  4. Read the VoiceDesign guide for voice description best practices
mT

murmr Team

Engineering

Building the next generation of multilingual text-to-speech.

Related Posts