Building on top of open source models gives you access to a wide range of capabilities that you would otherwise lack from a black box endpoint provider.
All-new model management, a text embedding model that matches OpenAI, and misgif, the most fun you’ll have with AI all week.
Mistral 7B LLM, GPU comparisons, model observability features, and an open source AI event series
The latest version of Truss brings new solutions for the most common pain points in packaging and serving ML models. Plus, learn how to optimize Stable Diffusion XL inference to run in as little as 3 seconds and build your own open-source version of ChatGPT with Llama 2 and Chainlit.
Llama 2 and SDXL shake up foundation model leaderboards (plus: Langchain, autoscaling, and more)
Stability AI announced the release of Stable Video Diffusion, marking a huge leap forward for open source novel video synthesis
Switching from a closed source ecosystem where you consume ML models from API endpoints to the world of open source ML models can seem intimidating. But this checklist will give you all of the resources you need to make the leap.
When you depend on an open source package, like transformers from PyPi, the best practice is to pin the version you use to ensure there aren’t breaking changes or security vulnerabilities introduced to your codebase. You can do the same for model weights and associated code by pinning a model revision.
A text embedding model transforms text into a vector of numbers that represents the text’s semantic meaning. There are a number of high-quality open source text embedding models for different use cases across search, recommendation, classification, and retrieval-augmented generation with LLMs.
The A10 is an Ampere-series datacenter GPU well-suited to many model inference tasks, such as running seven billion parameter LLMs. However, AWS users run those same workloads on the A10G, a variant of the graphics card created specifically for AWS. The A10 and A10G have somewhat different specs — most notably around tensor compute — but are interchangeable for most model inference tasks because they share the same GPU memory and bandwidth, and most model inference is memory bound.
To attain the full power of a GPU during LLM inference, you have to know if the inference is compute bound or memory bound. Learn how to better utilize GPU resources.
This article compares two popular GPUs—the NVIDIA A10 and A100—for model inference and discusses the option of using multi-GPU instances for larger models.
This guide helps you navigate NVIDIA’s datacenter GPU lineup and map it to your model serving needs.
If you’re using the ChatCompletions API and want to experiment with open source LLMs for your generative AI application, we’ve built a bridge that lets you try out models like Mistral 7B with just three tiny code changes.
Out of the box, Stable Diffusion XL 1.0 (SDXL) takes 8-10 seconds to create a 1024x1024px image from a prompt on an A100 GPU. Here’s everything I did to cut SDXL invocation to as fast as 1.92 seconds on an A100.
Llama 2 is an open-source LLM that is competitive on results quality with GPT-3.5, which powers ChatGPT. Chainlit is an open-source tool for creating a ChatGPT-style interface. This tutorial shows you how to build a ChatGPT-style interface for your favorite open source LLMs like Llama 2.
Build a ChatGPT-style chatbot with open-source Llama 2 and LangChain in a Python notebook.
Baseten is a HIPAA-compliant MLOps platform for fine-tuning, deploying, and monitoring ML models on secure model infrastructure.
Baseten is a SOC 2 Type II certified and HIPAA compliant platform for fine-tuning, deploying, and serving ML models, LLMs, and AI models.
Baseten is an MLOps platform for model deployment, model serving, and model fine-tuning. It has achieved SOC 2 type 2 certification for policies, procedures, and controls related to data security, privacy, and confidentiality.
At the end of 2019, my co-founders and I started Baseten to solve many of the issues that we’d faced building machine learning models at companies big and small, as ICs and as leads
See hackathon projects from Baseten for ML infrastructure, inference, user experience, and streaming
Want to host an AI community meetup, but aren’t sure where to start? Julien shares his top 10 tips for successfully hosting an AI meetup.
Meet Daniel, Data Scientist and founding data science team member at SIL
Meet Nikhil, Machine Learning Engineer at Patreon and early data team expert