NVIDIA Unveils Fugatto: An AI That Can Create "Never-Before-Heard" Sounds

By Chris McKay November 26, 2024 • 2 min read

NVIDIA's new AI music editor, Fugatto, can make a trumpet meow! Fugatto ( (short for Foundational Generative Audio Transformer Opus 1)) can generate music, sound effects, and even speech by interpreting prompts in text and audio formats. This model doesn't just recreate existing soundscapes; it can produce completely novel ones, transcending traditional audio generation boundaries. Imagine a saxophone howling, or dogs barking in sync to electronic beats—Fugatto delivers that level of sonic creativity. Check it out:

What's New: AI models like those from Stability AI, OpenAI, Google DeepMind, ElevenLabs, and Adobe also offer audio synthesis, but Fugatto's claim to fame is its ability to create truly unheard-of sounds. It expands creative boundaries in ways that other tools simply can’t, giving audio producers a new level of control and spontaneity.

In NVIDIA's demonstration, the Fugatto model, aptly nicknamed the "Swiss Army knife for sound," uses a unique technique called ComposableART. This allows users to control the generated audio attributes—like accent, emotion, or even blending sounds—using free-form prompts. Whether you're a music producer experimenting with new beats or a game developer needing dynamic sound effects, Fugatto’s flexibility seems limitless.

Emergent Capabilities: Fugatto stands apart from existing AI models by showcasing "emergent properties"—abilities that arise from training rather than explicit programming. In practical terms, this means Fugatto can combine, transform, or interpolate between instructions with remarkable fluidity. For example, users could ask it to create a "saxophone barking" or blend the sounds of rain and thunder moving in sync with crescendos. Unlike many AI models that only replicate what they’ve been trained on, Fugatto can generate entirely new soundscapes it hasn’t seen before.

The Bigger Picture: The system's capabilities extend beyond novel sound creation. Music producers could use it to quickly prototype song ideas or edit existing tracks. Video game developers could modify audio assets in real-time based on gameplay. The tool even allows for personalization of voice-based applications - imagine language learning software that speaks in the voice of a friend or family member.

Data Sources and Training: According to the research paper they released, Fugatto was trained on a massive dataset comprising over 50,000 hours of audio from various open-source collections. The training data spans multiple categories including speech, music, and environmental sounds, with careful attention paid to dataset diversity and quality.

What They’re Saying: "The idea that I can create entirely new sounds on the fly in the studio is incredible," said Ido Zmishlany, multi-platinum producer and co-founder of One Take Audio. NVIDIA's Rafael Valle adds, "Fugatto is our first step toward a future where unsupervised multitask learning in audio synthesis and transformation emerges from data and model scale."

The tool promises to rewrite the next chapter of music history, akin to the way the electric guitar shaped rock or samplers birthed hip-hop.

Chris McKay is the founder and chief editor of Maginative. His thought leadership in AI literacy and strategic AI adoption has been recognized by top academic institutions, media, and global brands.