Amazon Voice AI

Amazon Unveils Nova Sonic, Its Most Humanlike Voice AI Yet

April 8, 2025 • 2 min read

Amazon has introduced Nova Sonic, a new foundation model built to make voice-based AI apps more natural, responsive, and cost-effective. The company says that Nova Sonic rivals—and in some cases beat—OpenAI’s GPT-4o and Google’s Gemini Flash.

Key Points:

Nova Sonic unifies speech recognition, generation, and understanding in one model
Beats OpenAI’s GPT-4o and Google Gemini in key accuracy and latency benchmarks
Supports tool use, accents, real-time barge-ins, and nuanced turn-taking
Powers the upgraded Alexa+ and is 80% cheaper than GPT-4o in real-time speech

Like those other multimodal voice models, Nova Sonic understands what you’re saying and how you’re saying it—capturing nuances like tone, rhythm, hesitation, and even interruptions. This lets it respond in a way that feels more like talking to a human and less like issuing commands to a robot.

"When it comes to conversation, words have meaning, but words alone can fall flat without acoustic context that give them depth," Amazon explained in its announcement. This approach allows the model to adapt its responses based on acoustic context, including handling natural pauses and interruptions—a feature Amazon calls "barge-ins."

The company is positioning Nova Sonic directly against OpenAI's GPT-4o (Realtime) and Google's Gemini Flash 2.0. They say Nova Sonic achieves a 51% win rate against OpenAI's model and nearly 70% against Google's in conversational quality tests.

On the Multilingual LibriSpeech benchmark, Nova Sonic reportedly achieved a 4.2% word error rate across five languages, which Amazon says is 36.4% better than OpenAI's GPT-4o Transcribe model. For noisy environments with multiple speakers—the kind that typically confound voice systems—Amazon claims a 46.7% relative improvement.

Finally, the company reports an average perceived latency of 1.09 seconds from when a user stops speaking to when Nova Sonic starts responding—slightly faster than OpenAI's 1.18 seconds and Google's 1.41 seconds, according to benchmarking by Artificial Analysis.

Amazon also points out that Nova Sonic is "nearly 80% less expensive than OpenAI's GPT-4o (Realtime)," which could potentially give Amazon a competitive edge as businesses look to deploy these technologies at scale.

Early adopters include education company EF, which is using Nova Sonic to help students practice new vocabulary and improve pronunciation. "The model is capable of accurately understanding non-native English speakers with a variety of accents," said Tim Hesse, VP of AI and Data at EF.

Currently, Nova Sonic offers both masculine and feminine voices in American and British English accents, with Amazon promising additional languages and accents soon. The model is available through Amazon Bedrock, the company's generative AI service on AWS, via a new bi-directional streaming API.

Chris McKay is the founder and chief editor of Maginative. His thought leadership in AI literacy and strategic AI adoption has been recognized by top academic institutions, media, and global brands.