"Inference Engineering" is now available. Get your copy here
transcription

OpenAI logoWhisper Large V3

The most performant Whisper Large V3 implementation, achieving 1800x real-time factor (1 hour of audio transcribed in 2 seconds).

Model details

Example usage

Transcribe audio files at up to 1800x real-time factor (RTF). To add streaming or diarization get in touch with our engineers. Learn more about our optimized Whisper transcription pipeline in our launch blog.

Recommended setups for different use cases: 

Try the example code below, or check out our API documentation for more detailed information.

Input
1import requests
2import os
3
4model_id = ""  # Add the model ID for your deployment
5
6# Read secrets from environment variables
7baseten_api_key = os.environ["BASETEN_API_KEY"]
8
9# Define the request payload
10payload = {
11    "whisper_input": {
12        "audio": {
13            "url": "https://example.com/audio.wav"
14            # "audio_b64": "BASE64_ENCODED_AUDIO"  # Uncomment if using Base64
15        },
16        "whisper_params": {
17            "prompt": "Optional transcription prompt",
18            "audio_language": "en"
19        }
20    }
21}
22
23resp = requests.post(
24    f"https://model-{model_id}.api.baseten.co/environments/production/predict",
25    headers={"Authorization": f"Api-Key {baseten_api_key}"},
26    json=payload
27)
28
29print(resp.json())
JSON output
1{
2    "language_code": "en",
3    "language_prob": null,
4    "segments": [
5        {
6            "text": "That's one small step for man, one giant leap for mankind.",
7            "log_prob": -0.8644908666610718,
8            "start_time": 0,
9            "end_time": 9.92
10        }
11    ]
12}

🔥 Trending models