Whisper Large V2

The most performant Whisper Large V2 implementation, optimized to achieve ~1800x real-time factor (1 hour of audio transcribed in 2 seconds).

Deploy now

‌

Model details

Developed by
OpenAI
Model family
Whisper
Use case
transcription
Version
V2
Size
Large
Hardware
H100 MIG 40GB
License
MIT

Example usage

Transcribe audio files at up to 1800x real-time factor (RTF). To add streaming or diarization get in touch with our engineers. Learn more about our optimized Whisper transcription pipeline in our launch blog.

Recommended setups for different use cases:

Balanced
- GPU type: H100 MIG
- Concurrency target: <= 18
Highly latency-sensitive
- GPU type: H100
- Concurrency target: <= 32
Cost-sensitive
- GPU type: L4
- Concurrency target: <= 12

Try the example code below, or check out our API documentation for more detailed information.

Input

1import requests
2import os
3
4model_id = ""  # Add the model ID for your deployment
5
6# Read secrets from environment variables
7baseten_api_key = os.environ["BASETEN_API_KEY"]
8
9# Define the request payload
10payload = {
11    "whisper_input": {
12        "audio": {
13            "url": "https://example.com/audio.wav"
14            # "audio_b64": "BASE64_ENCODED_AUDIO"  # Uncomment if using Base64
15        },
16        "whisper_params": {
17            "prompt": "Optional transcription prompt",
18            "audio_language": "en"
19        }
20    }
21}
22
23resp = requests.post(
24    f"https://model-{model_id}.api.baseten.co/environments/production/predict",
25    headers={"Authorization": f"Api-Key {baseten_api_key}"},
26    json=payload
27)
28
29print(resp.json())

JSON output

1{
2    "language_code": "en",
3    "language_prob": null,
4    "segments": [
5        {
6            "text": "That's one small step for man, one giant leap for mankind.",
7            "log_prob": -0.8644908666610718,
8            "start_time": 0,
9            "end_time": 9.92
10        }
11    ]
12}