Rime Mist v3
Rime Mist v3 is an updated inference engine for the Mist text-to-speech model.
Model details
Example usage
Concurrency & Performance
Mist v3 is an ultra-low latency TTS model for real-time systems and AI agents, with TTFB low as 40 ms on GPUs like Ada Lovelace (L40S) and Blackwell (RTX PRO 6000), making it well-suited for steady-state production loads without complex auto-scaling. For agent-native companies where TTS is often the throughput bottleneck, this concurrency headroom is particularly significant.
On-Prem Deployment via Baseten
Existing Rime on-prem licensees can bring their Rime license to Baseten and get scalable inference on day one, without sub-processor onboarding or additional vendor agreements. Contact Rime to inquire about on-prem licensing.
Quickstart
Here is a basic example payload:
1{
2 "text": "Hello! This is Rime speaking.",
3 "speaker": "celeste",
4 "lang": [
5 "en",
6 "es",
7 "fr",
8 "de"
9 ]
10}Sending this request in Python:
1import json
2import urllib.request
3
4BASETEN_API_KEY = "your_api_key_here"
5
6headers = {
7 "Accept": "audio/mp3",
8 "Authorization": f"Bearer {BASETEN_API_KEY}",
9 "Content-Type": "application/json"
10}
11
12payload = {
13 "text": "Hello! This is Rime speaking.",
14 "speaker": "celeste",
15 "modelId": "mistv3"
16}
17
18data = json.dumps(payload).encode("utf-8")
19
20BASETEN_ENDPOINT = "https://<endpoint>.api.baseten.co/environments/production/predict"
21
22request = urllib.request.Request(
23 BASETEN_ENDPOINT,
24 data=data,
25 headers=headers,
26 method="POST"
27)
28
29with urllib.request.urlopen(request) as response:
30 with open("output.mp3", "wb") as f:
31 while chunk := response.read(4096):
32 f.write(chunk)
33
34print("Audio saved to output.mp3")1{
2 "audio": "base64 stream"
3}2. Payload Parameters
2.1 Required parameters
text(str): The text to convert to speech.speaker(str): The voice to use. Browse all available voices at rime.ai/docs/voices.modelId(str): Set tomistv3for this model.
2.2 Optional parameters
Implementation Notes
Custom Pronunciation
Mist v3's phoneme-first architecture gives you deterministic, highly controllable pronunciation; critical for brand names, medical terms, and domain-specific vocabulary. To use custom pronunciation, wrap the word in curly brackets and set phonemizeBetweenBrackets to true:
Use the Pronunciation tool in the Rime dashboard to generate phonetic strings for any word.
1payload = {
2 "text": "Welcome to {r1Ym} labs.",
3 "speaker": "peak",
4 "modelId": "mistv3",
5 "phonemizeBetweenBrackets": True
6}1{
2 "audio": "base64 stream"
3}