Jan 11, 2024

NVIDIA L4 GPUs now generally available on Baseten

You can now deploy models to instances powered by the L4 GPU on Baseten. NVIDIA’s L4 GPU is an Ada Lovelace series GPU with:

121 teraFLOPS of float16 compute
24 GB of VRAM at a 300 GB/s memory bandwidth

While the L4 is the next-gen successor to the T4, it’s natural to compare it instead to the A10 as both have 24GB of VRAM. However, the two are better suited for different workloads.

Thanks to its high compute power, the L4 is great for:

Image generation models like Stable Diffusion XL
Batch jobs of Whisper and other transcription tasks
Any compute-bound model inference tasks

However, due to lower memory bandwidth, the L4 is not well suited for:

Most LLM inference tasks, like running Mistral 7B or Llama 7B for chat
Most autoregressive model inference
Any memory-bound model inference tasks

L4-based instances start at $0.8484/hour — about 70% of the cost of an A10G-based instance. L4 GPU instances are priced as follows:

If you have any questions about using L4 GPUs for model inference, please let us know at support@baseten.co.