Platform
Platform
Solutions
Solutions
Resources
Resources
Pricing
Pricing
Docs
Docs
Log in
Get started
Philip Kiely
Lead Developer Advocate
Infrastructure
Using fractional H100 GPUs for efficient model serving
Matt Howard
3 others
Model performance
Benchmarking fast Mistral 7B inference
Abu Qader
3 others
Model performance
33% faster LLM inference with FP8 quantization
Pankaj Gupta
1 other
Model performance
High performance ML inference with NVIDIA TensorRT
Justin Yi
1 other
Model performance
FP8: Efficient model inference with 8-bit floating point numbers
Pankaj Gupta
1 other
Infrastructure
The benefits of globally distributed infrastructure for model serving
Phil Howes
1 other
Model performance
40% faster Stable Diffusion XL inference with NVIDIA TensorRT
Pankaj Gupta
2 others
Model performance
Why GPU utilization matters for model inference
Marius Killinger
1 other
AI engineering
The best open source large language model
Philip Kiely
1
2
3
4
5
6
7
Explore Baseten today
Start deploying
Talk to an engineer