Whisper V3
A speech-to-text model for accurately transcribing audio across dozens of languages.
Deploy Whisper V3 behind an API endpoint in seconds.
Deploy modelExample usage
The model accepts a single URL to an audio file, such as a .mp3
or .wav
. The audio file should contain clearly audible speech. This example transcribes a ten-second snippet of a recitation of the Gettysburg address.
The JSON output includes the auto-detected language, transcription segments with timestamps, and the complete transcribed text.
1import requests
2import os
3
4# Replace the empty string with your model id below
5model_id = ""
6
7data = {
8 "url": "https://cdn.baseten.co/docs/production/Gettysburg.mp3"
9}
10
11
12# Call model endpoint
13res = requests.post(
14 f"https://model-{model_id}.api.baseten.co/production/predict",
15 headers={"Authorization": f"Api-Key {baseten_api_key}"},
16 json=data
17)
18
19# Print the output of the model
20print(res.json())
1{
2 "language": "english",
3 "segments": [
4 {
5 "start": 0,
6 "end": 6.5200000000000005,
7 "text": "Four score and seven years ago, our fathers brought forth upon this continent a new nation"
8 },
9 {
10 "start": 6.5200000000000005,
11 "end": 11,
12 "text": "conceived in liberty and dedicated to the proposition that all men are created equal."
13 }
14 ],
15 "text": "Four score and seven years ago, our fathers brought forth upon this continent a new nation conceived in liberty and dedicated to the proposition that all men are created equal."
16}