Announcing our Series F. Learn more

Blog

Inference engineering

All Research Model performance AI engineering Infrastructure News Community AI models Product Foundations

Community

How to choose an AI model: lessons from Notion and Gamma

Chloe Florit

How to choose an AI model

Model performance

How to optimize LLM inference speed and reduce costs in production

Chloe Florit

LLM inference optimizations

Infrastructure

H100 vs. H200 GPUs

Chloe Florit

H100 vs. H200

AI models

GLM 5.2 With Vision

Harry Partridge

Model performance

Real-time video generation inference on Baseten

Ali Taha

Brendan Duke

Yikai Zhu

Faraz Shahsavan

Pankaj Gupta

Philip Kiely

Ali Taha

5 others

Real-time video generation inference

AI models

Fast, accurate retrieval with NVIDIA Nemotron 3 Embed

Albert Lee

NVIDIA Nemotron 3 Embed models are now available in Baseten.

AI models

Meet Inkling: Thinking Machines Lab's new customizable model

Albert Lee

Meet Inkling: Thinking Machines Lab's new customizable model

AI models

Introducing Step 3.7 Flash: multimodal reasoning at scale

Albert Lee

Step 3.7 Flash

AI engineering

Building with NVIDIA Nemotron 3 Ultra and LangChain Deep Agents Code on Baseten

Philip Kiely

Philip Kiely

Build frontier AI agents with NVIDIA Nemotron 3 Ultra and LangChain Deep Agents on Baseten—top open-model agent accuracy at ~10× lower cost with zero setup.

1 2 3...26