"Inference Engineering" is now available. Get your copy here

Model library

Browse our library of open source models that are ready to deploy behind an API endpoint in seconds.

11 vLLM models

Qwen Logo
LLM

Qwen3.5 9B Latency

V1 - Latency - vLLM - H100
Qwen Logo
LLM

Qwen3.5 35B-A3B Latency

V1 - Latency - vLLM - H100
Qwen Logo
LLM

Qwen3.5 122B-A10B Latency

V1 - Latency - vLLM - H100
Mistral AI logo
LLM

Mistral Small 3.1

3.1 - vLLM - H100
google logo
LLM

Gemma 3 27B IT

3 - Instruct - vLLM - H100
Meta logo
LLM

Llama 4 Scout

V4.0 - Instruct - vLLM - H100
Qwen Logo
LLM

Qwen3.5 4B Latency

V1 - Latency - vLLM - H100
Meta logo
LLM

Llama 4 Maverick

V4.0 - Instruct - vLLM - B200
ByteDance logo
LLM

Seed OSS 36B Instruct

Seed OSS 36B Instruct - Instruct - vLLM - H100
Mistral AI logo
LLM

Pixtral 12B

Pixtral - vLLM - H100
Microsoft Logo
LLM

Phi 3.5 Mini Instruct

3.5 - 128k - vLLM - A10G

🔥 Trending models