"Inference Engineering" is now available. Get your copy here
text to speech

Rime-SymbolRime Mist v3

Rime Mist v3 is an updated inference engine for the Mist text-to-speech model.

Model details

Example usage

Concurrency & Performance

Mist v3 is an ultra-low latency TTS model for real-time systems and AI agents, with TTFB low as 40 ms on GPUs like Ada Lovelace (L40S) and Blackwell (RTX PRO 6000), making it well-suited for steady-state production loads without complex auto-scaling. For agent-native companies where TTS is often the throughput bottleneck, this concurrency headroom is particularly significant.

On-Prem Deployment via Baseten

Existing Rime on-prem licensees can bring their Rime license to Baseten and get scalable inference on day one, without sub-processor onboarding or additional vendor agreements. Contact Rime to inquire about on-prem licensing.

Quickstart

Here is a basic example payload:

Input
JSON output
1{
2    "text": "Hello! This is Rime speaking.",
3    "speaker": "celeste",
4    "lang": [
5        "en",
6        "es",
7        "fr",
8        "de"
9    ]
10}

Sending this request in Python:

Input
1import json
2import urllib.request
3
4BASETEN_API_KEY = "your_api_key_here"
5
6headers = {
7    "Accept": "audio/mp3",
8    "Authorization": f"Bearer {BASETEN_API_KEY}",
9    "Content-Type": "application/json"
10}
11
12payload = {
13    "text": "Hello! This is Rime speaking.",
14    "speaker": "celeste",
15    "modelId": "mistv3"
16}
17
18data = json.dumps(payload).encode("utf-8")
19
20BASETEN_ENDPOINT = "https://<endpoint>.api.baseten.co/environments/production/predict"
21
22request = urllib.request.Request(
23    BASETEN_ENDPOINT,
24    data=data,
25    headers=headers,
26    method="POST"
27)
28
29with urllib.request.urlopen(request) as response:
30    with open("output.mp3", "wb") as f:
31        while chunk := response.read(4096):
32            f.write(chunk)
33
34print("Audio saved to output.mp3")
JSON output
1{
2    "audio": "base64 stream"
3}

2. Payload Parameters

2.1 Required parameters

  • text (str): The text to convert to speech.

  • speaker (str): The voice to use. Browse all available voices at rime.ai/docs/voices.

  • modelId (str): Set to mistv3 for this model.

2.2 Optional parameters

Implementation Notes

Custom Pronunciation

Mist v3's phoneme-first architecture gives you deterministic, highly controllable pronunciation;  critical for brand names, medical terms, and domain-specific vocabulary. To use custom pronunciation, wrap the word in curly brackets and set phonemizeBetweenBrackets to true:

Use the Pronunciation tool in the Rime dashboard to generate phonetic strings for any word.

Input
1payload = {
2    "text": "Welcome to {r1Ym} labs.",
3    "speaker": "peak",
4    "modelId": "mistv3",
5    "phonemizeBetweenBrackets": True
6}
JSON output
1{
2    "audio": "base64 stream"
3}

🔥 Trending models