AudioGen: deploy and build today!

TL;DR

AudioGen, part of the AudioCraft family of models from Meta AI, is now available in the Baseten model library. This post will go through a high-level overview of what AudioGen is and how to quickly deploy it from the Baseten model library, as well as show off some sample outputs.

AudioGen: a breakthrough in text-to-audio

The AudioCraft family of models from Meta AI includes AudioGen, MusicGen, and EnCodec, which together comprise the latest state-of-the-art text-to-audio open source foundation models from Meta AI. AudioGen was trained on publicly available sound effects, and is capable of creating an incredible array of sounds based on simple text inputs. Accomplishing this is a huge leap forward for text-to-audio generation, given that generating high-fidelity audio is a complex task.

Two-click deploy AudioGen and MusicGen

Both AudioGen and MusicGen are currently available on the Baseten model library. You can deploy either (or both!) directly to Baseten by clicking on the green button in the top right of the model page. There’s no need to worry about figuring out which instance types you need, as we’ve selected the most efficient GPUs for both models on Baseten (in this case it’s a single Nvidia A10 GPU).

Screenshot of the AudioGen medium model in the Baseten model library

Learn more about deploying open-source models from Baseten’s model library

AudioGen sample outputs

Once your model is deployed, you can run inference either through the Baseten client or curl. AudioGen takes a list of prompts and a duration in seconds for input, and for output generates one clip per prompt, returning each clip as a base64 encoded WAV file.

We’ve started to play around with AudioGen and are impressed by the results! Below are a couple of our favorites:

Prompt: footsteps on a wooden floor

Output:

Prompt: small dog barking

Output:

Prompt: man talking, emergency vehicle siren

Output:

Talk to us!

We’d love to learn more about what you’re building, so don’t hesitate to reach out to us via email at hi@baseten.co, or on Twitter, Threads, or LinkedIn. We can’t wait to see what you create!

AudioGen: deploy and build today!

TL;DR

AudioGen: a breakthrough in text-to-audio

Two-click deploy AudioGen and MusicGen

AudioGen sample outputs

Output:

Output:

Output:

Talk to us!

Related ML models posts

The best open-source image generation model

Comparing few-step image generation models

The best open source large language model