text to speech

CoquiXTTS V2

State of the art text to speech model

Model details

View repository

Example usage

This model requires at least two inputs:

  • text: The input text that needs to be spoken

  • speaker_voice: An audio file containing the audio of a single person

The model will try to output an audio file containing the speech in the speaker's style. The output is a base64 string so it needs to get converted to an audio format before it can be played.

Input
1import base64
2import sys
3
4# Paste your model id below
5model_id = ""
6baseten_api_key = os.environ["BASETEN_API_KEY"]
7
8def wav_to_base64(file_path):
9  with open(file_path, "rb") as wav_file:
10    binary_data = wav_file.read()
11    base64_data = base64.b64encode(binary_data)
12    base64_string = base64_data.decode("utf-8")
13    return base64_string
14
15def base64_to_wav(base64_string, output_file_path):
16  binary_data = base64.b64decode(base64_string)
17  with open(output_file_path, "wb") as wav_file:
18    wav_file.write(binary_data)
19
20voice = wav_to_base64("/path/to/wav/file/voice.wav")
21text = "Listen up, people. Life's a wild ride, and sometimes you gotta grab it by the horns and steer it where you want to go. You can't just sit around waiting for things to happen – you gotta make 'em happen. Yeah, it's gonna get tough, but that's when you dig deep, find that inner badass, and come out swinging. Remember, success ain't handed to you on a silver platter; you gotta snatch it like it owes you money. So, lace up your boots, square those shoulders, and let the world know that you're here to play, and you're playing for keeps"
22data = {"text": text, "speaker_voice": voice, "language": "en"}
23
24res = requests.post(
25    f"https://model-{model_id}.api.baseten.co/production/predict",
26    headers={"Authorization": f"Api-Key {baseten_api_key}"},
27    json=data
28)
29
30res = res.json()
31output = base64_to_wav(res.get('output'), "output.wav")
JSON output
1{
2    "output": "iVBORw0KGgoAAAANSUhEU"
3}
Preview
Video

text to speech models

See all
Canopy Labs Logo
Text to speech

Orpheus TTS

TRT-LLM - H100 MIG 40GB
three triangles with the bottom edge missing inside each other
Text to speech

MARS6

V6 - L4
Coqui
Text to speech

XTTS V2

T4

Coqui models

See all
Coqui
Text to speech

XTTS V2

T4

🔥 Trending models