Platform
Platform
Solutions
Solutions
Resources
Resources
Pricing
Pricing
Docs
Docs
Log in
Get started
Philip Kiely
Lead Developer Advocate
Model performance
How we built production-ready speculative decoding with TensorRT-LLM
Pankaj Gupta
2 others
Model performance
A quick introduction to speculative decoding
Pankaj Gupta
2 others
Infrastructure
Evaluating NVIDIA H200 Tensor Core GPUs for LLM inference
Pankaj Gupta
1 other
News
Export your model inference metrics to your favorite observability tool
Helen Yang
2 others
Community
Building high-performance compound AI applications with MongoDB Atlas and Baseten
Philip Kiely
Model performance
How to build function calling and JSON mode for open-source and fine-tuned LLMs
Bryce Dubayah
1 other
News
Introducing function calling and structured output for open-source and fine-tuned LLMs
Bryce Dubayah
1 other
AI engineering
The best open-source image generation model
Philip Kiely
Model performance
How to double tokens per second for Llama 3 with Medusa
Abu Qader
1 other
1
2
3
...
7
Explore Baseten today
Start deploying
Talk to an engineer