You can now deploy models to instances powered by the L4 GPU on Baseten. NVIDIA’s L4 GPU is an Ada Lovelace series GPU with:
121 teraFLOPS of float16 compute
24 GB of VRAM at a 300 GB/s memory bandwidth
While the L4 is the next-gen successor to the T4, it’s natural to compare it instead to the A10 as both have 24GB of VRAM. However, the two are better suited for different workloads.
Thanks to its high compute power, the L4 is great for:
Image generation models like Stable Diffusion XL
Batch jobs of Whisper and other transcription tasks
Any compute-bound model inference tasks
However, due to lower memory bandwidth, the L4 is not well suited for:
Most LLM inference tasks, like running Mistral 7B or Llama 7B for chat
Most autoregressive model inference
Any memory-bound model inference tasks
L4-based instances start at $0.8484/hour — about 70% of the cost of an A10G-based instance. L4 GPU instances are priced as follows:
If you have any questions about using L4 GPUs for model inference, please let us know at email@example.com.