Abu Qader

Software Engineer

Abu Qader

Model performance

How we run GPT OSS 120B at 500+ tokens per second on NVIDIA GPUs

Amir Haghighat

Tri Dao

Abu Qader

Bryce Dubayah

Philip Kiely

Amir Haghighat

4 others

GPT OSS 120B

News

Introducing our Speculative Decoding Engine Builder integration for ultra-low-latency LLM inference

Abu Qader

Bryce Dubayah

Rachel Rapp

Justin Yi

3 others

Speculative Decoding in Engine Builder

Model performance

How to double tokens per second for Llama 3 with Medusa

Abu Qader

Philip Kiely

Abu Qader

1 other

Double Llama TPS with Medusa

News

Introducing automatic LLM optimization with TensorRT-LLM Engine Builder

Abu Qader

Philip Kiely

Abu Qader

1 other

TensorRT-LLM Engine Creation

Model performance

Benchmarking fast Mistral 7B inference

Abu Qader

Pankaj Gupta

Philip Kiely

Abu Qader

3 others

Mistral 7B

Model performance

Introduction to quantizing ML models

Abu Qader

Philip Kiely

Abu Qader

1 other

Quantization

Explore Baseten today

Start deploying

Talk to an engineer