Philip Kiely

Lead Developer Advocate

Philip Kiely

Model performance

Continuous vs dynamic batching for AI inference

Matt Howard

Philip Kiely

Matt Howard

1 other

Continuous vs Dynamic batching

Infrastructure

Using fractional H100 GPUs for efficient model serving

Matt Howard

Vlad Shulman

Pankaj Gupta

Philip Kiely

Matt Howard

3 others

H100 MIGs

Model performance

Benchmarking fast Mistral 7B inference

Abu Qader

Pankaj Gupta

Philip Kiely

Abu Qader

3 others

Mistral 7B

Model performance

33% faster LLM inference with FP8 quantization

Pankaj Gupta

Philip Kiely

Pankaj Gupta

1 other

Faster inference with FP8

Model performance

High performance ML inference with NVIDIA TensorRT

Philip Kiely

Justin Yi

1 other

NVIDIA TensorRT

Model performance

FP8: Efficient model inference with 8-bit floating point numbers

Pankaj Gupta

Philip Kiely

Pankaj Gupta

1 other

8-bit floating point numbers

Infrastructure

The benefits of globally distributed infrastructure for model serving

Phil Howes

Philip Kiely

Phil Howes

1 other

Benefits of global infra

Model performance

40% faster Stable Diffusion XL inference with NVIDIA TensorRT

Pankaj Gupta

Philip Kiely

Pankaj Gupta

2 others

40% faster SDXL

Model performance

Why GPU utilization matters for model inference

Marius Killinger

Philip Kiely

Marius Killinger

1 other

Why GPU utilization matters

Explore Baseten today

Start deploying

Talk to an engineer