Introducing Baseten Loops: A Training SDK for Frontier RL. Learn more here
transcription

NVIDIA logoNVIDIA Nemotron 3.5 ASR Streaming Multilingual (0.6B)

600M Cache-Aware FastConformer-RNNT streaming ASR for real-time multilingual voice agents across ~36 languages, with native punctuation and low latency.

Model details

View repository

Example usage

Overview

NVIDIA Nemotron 3.5 ASR Streaming Multilingual is a 600M parameter Cache-Aware FastConformer-RNNT streaming speech recognition model built for real-time multilingual voice agents. It transcribes roughly 36 languages across 40 language-locale pairs, emits native punctuation and capitalization, and supports runtime-configurable streaming latency with chunk sizes as low as 80 ms.

Capabilities

  • Real-time streaming transcription with strictly non-overlapping chunks (no buffered-inference redundancy).

  • Prompt-guided language selection: pass target_lang as a locale tag, or "auto" for automatic language identification.

  • Native punctuation and capitalization in the output transcript.

  • Low-latency chunked decoding with configurable attention context (chunk sizes down to 80 ms).

Use cases

  • Real-time voice agents.

  • Live captioning.

  • Multilingual transcription pipelines.

This model uses a custom JSON request (it is not OpenAI-compatible). POST an audio_url (or base64 audio_b64) along with a target_lang locale tag, or "auto" for language ID, to the deployment's predict endpoint.

Input
1import requests
2
3model_id = ""  # place your deployment's model ID here
4
5resp = requests.post(
6    f"https://model-{model_id}.api.baseten.co/environments/production/predict",
7    headers={"Authorization": "Api-Key BASETEN-API-KEY"},
8    json={
9        "audio_url": "https://dldata-public.s3.us-east-2.amazonaws.com/2086-149220-0033.wav",
10        "target_lang": "auto",
11    },
12)
13
14print(resp.json())
15
16# Pass a specific locale (for example "es-ES") instead of "auto" to force a
17# language, or send audio inline as base64 with the "audio_b64" field.
18
JSON output
1{
2    "text": "The cut on his chest still dripping blood, the ache of his overstrained eyes, even the soaring arena around him with the thousands of spectators, were trivialities not worth a thought.",
3    "target_lang": "auto"
4}

🔥 Trending models