Back to Blog
Tutorials

Own Your Voice Data: Portable Voice Embeddings

Extract voice embeddings, store them in your own database, and use them in any TTS request. The recommended pattern for multi-tenant apps.

mT
murmr Team
8 de marzo de 20265 min read
#multi-tenant#voice-embeddings#developer-tools#portable-data

What if you could extract a voice's identity, store it in your own database, and use it anywhere — no vendor lock-in? That's exactly what portable voice embeddings give you.

The Problem

If you're building a multi-tenant app — a voice studio, audiobook platform, or voice agent SaaS — you've hit this wall before. You have one API key, but many users. Whose voices are whose?

Most TTS providers force you into opaque voice IDs locked to their platform. Your users' voices exist in someone else's database, tied to your account, with no way to export or migrate them. Want to switch providers? Start over. Want to let users own their voice data? Not possible.

This creates real problems:

  • Vendor lock-in: Your users' voices are trapped in a third-party system
  • Multi-tenancy gaps: No clean separation between users under a shared API key
  • Compliance risk: Voice data lives outside your infrastructure, outside your control
  • Scalability limits: Plan-based voice slot caps don't scale with your user base

The Solution: Portable Voice Embeddings

murmr's extract-embeddings endpoint returns raw embedding data — a compact representation of a voice's identity — that you can store wherever you want. Your database, your blob storage, your edge cache. The workflow is three steps:

  1. Extract — Send a reference audio clip and its transcript. Get back prompt_data (the voice embedding).
  2. Store — Save prompt_data in your own database, associated with your user.
  3. Use — Pass prompt_data inline with any TTS request. No voice ID needed.

No voice slots consumed. No platform dependency. The embedding is yours.

Step 1: Extract Embeddings

Upload a short audio clip (5-30 seconds of clear speech) along with its transcript. The API returns prompt_data — a serialized tensor that captures the speaker's identity.

Extract embeddings
from murmr import Murmr

client = Murmr()

result = client.voices.extract_embeddings(
    audio=open("reference.wav", "rb").read(),
    ref_text="The transcript of the audio.",
)

# Store result.prompt_data in your database
db.execute(
    "INSERT INTO voices (user_id, name, prompt_data) VALUES (%s, %s, %s)",
    [user_id, "My Voice", result.prompt_data]
)

The prompt_data field is a base64-encoded string, typically 50-200KB. It stores cleanly in any database column that supports text or binary data.

Step 2: Use Stored Embeddings

When a user requests TTS, pull their embedding from your database and pass it inline. Set voice to "inline" to tell murmr to use the provided embedding instead of looking up a saved voice ID.

Use stored embeddings in TTS
# Retrieve from your database
voice = db.fetchone("SELECT prompt_data FROM voices WHERE id = %s", [voice_id])

audio = client.speech.stream(
    input="Hello from a portable voice!",
    voice="inline",
    voice_clone_prompt=voice["prompt_data"],
)

That's it. No voice IDs, no saved voice slots, no platform state. The embedding carries everything the model needs to reproduce the voice.

Why This Matters

Portable embeddings unlock patterns that opaque voice IDs simply can't support:

  • Data portability: Switch providers without losing your users' voices. The embeddings are standard data — export, back up, or migrate at will.
  • Unlimited multi-tenancy: Serve thousands of users from a single API key. Each user's voices live in your database, not in a shared pool with plan-based slot limits.
  • Edge caching: Cache embeddings at the edge for faster requests. They're just data — treat them like any other cacheable asset.
  • Privacy and compliance: Voice data stays in your infrastructure. You control retention, encryption, and deletion — critical for GDPR, HIPAA, or enterprise requirements.

How Other Providers Handle This

| Provider | Voice Storage | Portability | |----------|--------------|-------------| | ElevenLabs | Opaque voice IDs locked to their platform | No way to extract or export voice data | | PlayHT | Voice IDs tied to their storage | No embedding export | | murmr | Extract embeddings, store anywhere | Full portability via voice_clone_prompt |

The difference is architectural. Most providers treat voice identity as platform state. murmr treats it as data you own.

Getting Started

Install the SDK and start extracting embeddings in minutes.

bash
# Python
pip install murmr

# Node.js
npm install @murmr/sdk

murmr Studio uses this pattern

murmr Studio — our voice design app — uses portable embeddings in production. Each Studio user's voices are stored with their own embeddings, completely independent of murmr's saved voices system.

For the full API reference, including supported audio formats and transcript requirements, see the Portable Embeddings documentation. If you're building a multi-tenant application, this is the recommended integration pattern — it scales with your users, not against them.

Ready to try it? Get 50,000 free characters per month — no credit card required. Get Started Free

mT

murmr Team

Engineering

Building the next generation of multilingual text-to-speech.

Related Posts