Meta Research

Meta Open Sources AudioCraft

August 2, 2023 • 3 min read

Meta, has open-sourced its groundbreaking framework, AudioCraft, that simplifies the generation of high-quality, realistic audio and music from textual inputs. By training on raw audio signals rather than MIDI or piano rolls, AudioCraft can provide professional musicians, indie game developers, small business owners, and more with an easy-to-use tool for their audio needs.

AudioCraft consists of three models - MusicGen, AudioGen, and EnCodec - that can produce high-fidelity audio simply from text inputs. MusicGen generates musical compositions while AudioGen creates sound effects and ambient noise like bird calls or passing cars. EnCodec is the core of AudioCraft, transforming raw audio into discrete tokens that allow MusicGen and AudioGen to model audio data.

Prior generative AI models for audio have been complex and opaque. AudioCraft changes that through its simplified, unified architecture. According to Meta, AudioCraft models long-term audio patterns while generating coherent, expressive samples. Users can easily customize AudioCraft for their own datasets and tasks.

Two models in particular, AudioGen and MusicGen, illustrate the impressive capabilities of AudioCraft. AudioGen can generate environmental sounds based on a textual description of an acoustic scene, such as "whistling with wind blowing" or "sirens and a humming engine approach and pass".

Text Prompt: Whistling with wind blowing

0:00

/5.076

Text Prompt: Sirens and a humming engine approach and pass

0:00

/5.076

MusicGen, on the other hand, is designed specifically for music generation. When provided with text prompts such as "pop dance track with catchy melodies, tropical percussions, and upbeat rhythms, perfect for the beach", it can generate corresponding music tracks.

Text Prompt: Pop dance track with catchy melodies, tropical percussions, and upbeat rhythms, perfect for the beach

0:00

/30.06

Text Prompt: Earthy tones, environmentally conscious, ukulele-infused, harmonic, breezy, easygoing, organic instrumentation, gentle grooves

0:00

/30.06

By open sourcing AudioCraft under the MIT license, Meta aims to advance audio AI and address issues like bias through community input. While Meta trained MusicGen on licensed music, the dataset lacks diversity, being weighted toward Western genres with English metadata. By sharing AudioCraft transparently, Meta hopes researchers can mitigate such limitations.

Meta envisions AudioCraft as a tool for musicians and sound designers, providing inspiration and helping them quickly iterate on their compositions. The company also believes that generative AI like AudioCraft could drastically improve the speed of iteration and feedback during early prototyping stages, benefiting AAA developers, musicians, and small or medium-sized business owners alike.

Through this release, Meta continues its push toward transparent and accessible AI. With an easy-to-use foundation now public, innovators across domains can tap into generative audio. As Meta works to further improve AudioCraft, collaborative research enabled by open sourcing will help realize the technology's immense creative potential.

Chris McKay is the founder and chief editor of Maginative. His thought leadership in AI literacy and strategic AI adoption has been recognized by top academic institutions, media, and global brands.

An Exclusive Leadership Retreat

Leading in the Intelligence Age

Meta Open Sources AudioCraft