Cartesia Raises $64M to Advance Real-Time Voice AI with Sonic 2.0

By Chris McKay March 11, 2025 • 2 min read

Voice AI is getting faster, smarter, and more natural. Cartesia, a company specializing in real-time AI-driven voice technology, has raised $64 million in a Series A round led by Kleiner Perkins. This funding will accelerate research, expand the team, and refine Sonic 2.0, its latest voice model, which boasts industry-leading latency and voice cloning capabilities.

Key Points:

$64M Series A Funding led by Kleiner Perkins, supporting team growth and AI advancements.
Sonic 2.0 Model achieves 90ms latency for full models, 40ms for real-time applications.
Superior Voice Cloning that captures complex accents and fine-tunes voice styles.
Offers 99.9% uptime and on-device deployment options.

Sonic 2.0 is designed to generate ultra-realistic, low-latency speech, making it ideal for applications in conversational AI, creative content production, and real-time communication. The model leverages a state space architecture, doubling in size compared to its predecessor while maintaining higher speed and efficiency. It delivers 90-millisecond latency for full models and an even faster 40 milliseconds in real-time applications—performance metrics that outpace competitors.

Beyond speed, Cartesia’s technology excels in voice cloning, enabling the generation of lifelike speech that captures subtle nuances, accents, and tonal variations. This makes it particularly useful for use cases where precision is critical, such as customer service, content localization, and accessibility tools. The company has also introduced Sonic Turbo, an enhanced version aimed at delivering even faster synthesis.

Cartesia’s infrastructure is built for enterprise reliability, boasting 99.9% uptime and compliance with SOC-2 and HIPAA standards. The Sonic API is designed for developers, offering robust real-time performance and on-device deployment capabilities, which could make AI-driven voice applications more seamless across industries.

CEO Kar emphasized that voice AI is poised to become ubiquitous, with real-time AI-generated voices increasingly powering applications from call centers to virtual assistants. “This is the year of voice AI, and it’s going to be everywhere,” he said during the announcement.

With this funding, Cartesia aims to further refine its voice AI models, integrate new features like voice changer and infill editing, and push advancements in streaming architectures and on-device inference. As the AI race intensifies, Cartesia’s focus on speed, control, and naturalness could position it as a key player in the evolving voice AI ecosystem.

Chris McKay is the founder and chief editor of Maginative. His thought leadership in AI literacy and strategic AI adoption has been recognized by top academic institutions, media, and global brands.