transcription
Whisper Large V2
Whisper Large V2, optimized to achieve ~1800x real-time factor (1 hour of audio transcribed in 2 seconds).
Model details
Example usage
Transcribe audio files at up to 1800x real-time factor (RTF). To add streaming or diarization get in touch with our engineers. Learn more about our optimized Whisper transcription pipeline in our launch blog.
Recommended setups for different use cases:
Balanced
GPU type: H100 MIG
Concurrency target: <= 18
Highly latency-sensitive
GPU type: H100
Concurrency target: <= 32
Cost-sensitive
GPU type: L4
Concurrency target: <= 12
Input
1import requests
2import os
3
4model_id = "" # Add the model ID for your deployment
5
6# Read secrets from environment variables
7baseten_api_key = os.environ["BASETEN_API_KEY"]
8
9# Define the request payload
10payload = {
11 "whisper_input": {
12 "audio": {
13 "url": "https://example.com/audio.wav"
14 # "audio_b64": "BASE64_ENCODED_AUDIO" # Uncomment if using Base64
15 },
16 "whisper_params": {
17 "prompt": "Optional transcription prompt",
18 "audio_language": "en"
19 }
20 }
21}
22
23resp = requests.post(
24 f"https://model-{model_id}.api.baseten.co/environments/production/predict",
25 headers={"Authorization": f"Api-Key {baseten_api_key}"},
26 json=payload
27)
28
29print(resp.json())JSON output
1{
2 "language_code": "en",
3 "language_prob": null,
4 "segments": [
5 {
6 "text": "That's one small step for man, one giant leap for mankind.",
7 "log_prob": -0.8644908666610718,
8 "start_time": 0,
9 "end_time": 9.92
10 }
11 ]
12}