Announcing our Series F. Learn more

Philip Kiely

Head of AI Education

Philip Kiely

Model performance

How multi-node inference works for massive LLMs like DeepSeek-R1

Phil Howes

Philip Kiely

Phil Howes

1 other

Multi-node inference

Infrastructure

Testing Llama 3.3 70B inference performance on NVIDIA GH200 in Lambda Cloud

Pankaj Gupta

Philip Kiely

Pankaj Gupta

1 other

Testing GH200 GPUs

AI engineering

Private, secure DeepSeek-R1 in production in US & EU data centers

Amir Haghighat

Philip Kiely

Yineng Zhang

2 others

DeepSeek R1

Model performance

How we built production-ready speculative decoding with TensorRT-LLM

Pankaj Gupta

Philip Kiely

Pankaj Gupta

2 others

Speculative Decoding with TensorRT-LLM

Model performance

A quick introduction to speculative decoding

Pankaj Gupta

Philip Kiely

Pankaj Gupta

2 others

Intro to Speculative Decoding

Infrastructure

Evaluating NVIDIA H200 Tensor Core GPUs for LLM inference

Pankaj Gupta

Philip Kiely

Pankaj Gupta

1 other

NVIDIA H200

News

Export your model inference metrics to your favorite observability tool

Helen Yang

Nicolas Gere-lamaysouette

Philip Kiely

Helen Yang

2 others

Export your inference metrics

Community

Building high-performance compound AI applications with MongoDB Atlas and Baseten

Philip Kiely

Philip Kiely

MongoDB + Baseten

Model performance

How to build function calling and JSON mode for open-source and fine-tuned LLMs

Bryce Dubayah

Philip Kiely

Bryce Dubayah

1 other

JSON Mode

1 2 3 4...9

Explore Baseten today

Start deploying Talk to an engineer