Mistral 7B LLM, GPU comparisons, model observability features, and an open source AI event series
This article compares two popular GPUs—the NVIDIA A10 and A100—for model inference and discusses the option of using multi-GPU instances for larger models.
The latest version of Truss brings new solutions for the most common pain points in packaging and serving ML models. Plus, learn how to optimize Stable Diffusion XL inference to run in as little as 3 seconds and build your own open-source version of ChatGPT with Llama 2 and Chainlit.
Out of the box, Stable Diffusion XL 1.0 (SDXL) takes 8-10 seconds to create a 1024x1024px image from a prompt on an A100 GPU. Here’s everything I did to cut SDXL invocation to as fast as 1.92 seconds on an A100.
Llama 2 is an open-source LLM that is competitive on results quality with GPT-3.5, which powers ChatGPT. Chainlit is an open-source tool for creating a ChatGPT-style interface. This tutorial shows you how to build a ChatGPT-style interface for your favorite open source LLMs like Llama 2.
AudioGen, part of the AudioCraft family of models from Meta AI, is now available in the Baseten model library.
Llama 2 and SDXL shake up foundation model leaderboards (plus: Langchain, autoscaling, and more)
AI infrastructure, ML infrastructure, build vs. buy, model deployment