This article compares two popular GPUs—the NVIDIA A10 and A100—for model inference and discusses the option of using multi-GPU instances for larger models.
The latest version of Truss brings new solutions for the most common pain points in packaging and serving ML models. Plus, learn how to optimize Stable Diffusion XL inference to run in as little as 3 seconds and build your own open-source version of ChatGPT with Llama 2 and Chainlit.
Out of the box, Stable Diffusion XL 1.0 (SDXL) takes 8-10 seconds to create a 1024x1024px image from a prompt on an A100 GPU. Here’s everything I did to cut SDXL invocation to as fast as 1.92 seconds on an A100.
Llama 2 is an open-source LLM that is competitive on results quality with GPT-3.5, which powers ChatGPT. Chainlit is an open-source tool for creating a ChatGPT-style interface. This tutorial shows you how to build a ChatGPT-style interface for your favorite open source LLMs like Llama 2.
AudioGen, part of the AudioCraft family of models from Meta AI, is now available in the Baseten model library.
Llama 2 and SDXL shake up foundation model leaderboards (plus: Langchain, autoscaling, and more)
AI infrastructure, ML infrastructure, build vs. buy, model deployment
Build a ChatGPT-style chatbot with open-source Llama 2 and LangChain in a Python notebook.
Deploy Stable Diffusion XL 1.0 for free to generate images from text prompts and invoke Stable Diffusion with the Baseten Python client.
An in-depth look at open source foundation models, both LLMs and image models: Llama 2 from Meta and Microsoft, FreeWilly1 and FreeWilly2 from Stability AI, SDXL 1.0 (Stable Diffusion XL) also from Stability AI, LayoutLM Document QA from Inspira, and NSQL 350M from Number Station.