Model autoscaling features on Baseten
Autoscaling is the ability of your machine learning model to automatically spawn more replicas or terminate replicas in response to the amount of incoming traffic. Baseten employs a robust series of autoscaling features, including scale to zero and cold starts.
Models We Love: June 2023
An in-depth look at open source foundation models, primarily LLMs. Falcon-7B and Falcon-40B from TII, WizardLM from Microsoft and Peking University, MusicGen from Meta, and MPT-7B from Mosaic.
New in June 2023
LangChain adds Baseten integration, Falcon soars to the top of the LLM leaderboard
Three techniques to adapt LLMs for any use case
Prompt engineering, embeddings, vector databases, and fine-tuning are ways to adapt Large Language Models (LLMs) to run on your data for your use case
What I learned from my AI startup’s internal hackathon
See hackathon projects from Baseten for ML infrastructure, inference, user experience, and streaming
Deploy Falcon-40B on Baseten
Deploy Falcon-40B and Falcon-7B, top-ranked open-source LLMs on HuggingFace, to Baseten's production-ready ML infrastructure.
Deploy open-source models in a couple clicks from Baseten’s model library
An explanation of how Baseten's model library works for deploying and serving popular open-source models.
Getting started with foundation models on Baseten
A summary of foundation models with a focus on data type and scale as well as in-context learning and fine-tuning, using the LLaMA family of models from Meta as an example.
New in May 2023
Discover new models for text generation and text-to-speech, learn more about the GPUs they run on, and plug in to the community forming around open-source models
Understanding NVIDIA’s Datacenter GPU line
This guide helps you navigate NVIDIA’s datacenter GPU lineup and map it to your model serving needs.