AudioGen: deploy and build today!


AudioGen, part of the AudioCraft family of models from Meta AI, is now available in the Baseten model library. This post will go through a high-level overview of what AudioGen is and how to quickly deploy it from the Baseten model library, as well as show off some sample outputs.

AudioGen: a breakthrough in text-to-audio

The AudioCraft family of models from Meta AI includes AudioGen, MusicGen, and EnCodec, which together comprise the latest state-of-the-art text-to-audio open source foundation models from Meta AI. AudioGen was trained on publicly available sound effects, and is capable of creating an incredible array of sounds based on simple text inputs. Accomplishing this is a huge leap forward for text-to-audio generation, given that generating high-fidelity audio is a complex task.

Two-click deploy AudioGen and MusicGen

Both AudioGen and MusicGen are currently available on the Baseten model library. You can deploy either (or both!) directly to Baseten by clicking on the green button in the top right of the model page. There’s no need to worry about figuring out which instance types you need, as we’ve selected the most efficient GPUs for both models on Baseten (in this case it’s a single Nvidia A10 GPU).

Screenshot of the AudioGen medium model in the Baseten model library

Learn more about deploying open-source models from Baseten’s model library

AudioGen sample outputs

Once your model is deployed, you can run inference either through the Baseten client or curl. AudioGen takes a list of prompts and a duration in seconds for input, and for output generates one clip per prompt, returning each clip as a base64 encoded WAV file.

We’ve started to play around with AudioGen and are impressed by the results! Below are a couple of our favorites:

Prompt: footsteps on a wooden floor


Prompt: small dog barking


Prompt: man talking, emergency vehicle siren


Talk to us!

We’d love to learn more about what you’re building, so don’t hesitate to reach out to us via email at, or on TwitterThreads, or LinkedIn. We can’t wait to see what you create!

Machine learning infrastructure that just works

Baseten provides all the infrastructure you need to deploy and serve ML models performantly, scalable, and cost-efficiently.

Machine Learning

NVIDIA A10 vs A100 GPUs for LLM and Stable Diffusion inference

This article compares two popular GPUs—the NVIDIA A10 and A100—for model inference and discusses the option of using multi-GPU instances for larger models.

Philip Kiely

September 15, 2023

Machine Learning

SDXL inference in under 2 seconds: the ultimate guide to Stable Diffusion optimization

Out of the box, Stable Diffusion XL 1.0 (SDXL) takes 8-10 seconds to create a 1024x1024px image from a prompt on an A100 GPU. Here’s everything I did to cut SDXL invocation to as fast as 1.92 seconds on an A100.

Varun Shenoy

August 30, 2023