Stability AI Releases AudioSparx for Variable-Length Music

By Chris McKay February 9, 2024 • 2 min read

Stability AI, has released a new text-to-music AI model called AudioSparx that now powers its Stable Audio product. This new model is able to produce high-fidelity, long-form stereo music with significantly more variation and structure compared to previous state-of-the-art AI music generators.

At the core of AudioSparx 1.0 is a latent diffusion model that can rapidly generate music based on text prompts. But unlike prior iterations constrained to 30 seconds of audio, the new model leverages an enhanced conditioning system to output stereo music reliably up to 95 seconds long at CD-quality 44.1kHz sampling rate.

Berlin techno rave drum machine kick ARP synthesizer dark moody hypnotic evolving 135 bpm

0:00

/10

Calm meditation music to play in a spa lobby

0:00

/90

Crucially, AudioSparx 1.0 appears capable of mimicking the overall form and progression of complete songs in a way unmatched by competitors. The generated tracks contain recognizable introductions, verse/chorus patterns, transitions, instrumental breaks, and conclusions. This musicality demonstrates a refined understanding of fundamental song structure.

Beyond music, AudioSparx 1.0 constitutes the first AI system to realistically produce 44.1kHz stereo sound effects from text prompts. Users can request sounds like "outdoors forest with birds chirping" and receive immersive binaural audio. Augmenting prompts with "high-quality, stereo" yields optimal results.

Sports car passing by high quality stereo

0:00

Fireworks high quality stereo

0:00

/20

With its simultaneous prowess at variable-length music and sound generation, AudioSparx 1.0 represents a remarkable consolidation of multiple audio synthesis capabilities into a single model. This unified competence stems from Stability AI's general training procedure that does not strictly differentiate between musical and non-musical audio sources.

Overall, the inventive techniques underpinning the release of AudioSparx 1.0 promise to provide creative professionals an adaptable tool for assisting audio production. The model's capacity to deliver extensive, elaborately arranged music and sounds surpasses previous benchmarks, meeting demands once solely achievable via manual production. It highlights Stability AI's commitment to pushing AI toward matching human capabilities.

Resources:
- Code
- Demo
- Research Paper

Chris McKay is the founder and chief editor of Maginative. His thought leadership in AI literacy and strategic AI adoption has been recognized by top academic institutions, media, and global brands.