
Inception Labs, a startup founded by Stanford professor Stefano Ermon, has introduced Mercury Coder, the first large-scale diffusion-based language model (dLLM). Unlike traditional large language models (LLMs), which generate text sequentially, Mercury Coder processes entire sequences simultaneously using a diffusion approach—similar to AI image and video generation. The result: a model that claims to be up to ten times faster and significantly cheaper to run than existing models.
Key Points:
- Mercury Coder is the first diffusion-based large language model (dLLM), generating text using a coarse-to-fine approach rather than predicting tokens sequentially.
- It offers speeds up to 10x faster than traditional LLMs, generating over 1,000 tokens per second on NVIDIA H100 GPUs.
- Early benchmarks suggest Mercury Coder rivals models like GPT-4o Mini and Claude 3.5 Haiku while being more cost-efficient.
The model, based on research by Stanford professor Stefano Ermon, uses a different method—starting with a rough estimate of text and refining it in parallel, similar to how AI image and video generators like Midjourney and OpenAI’s Sora operate.
According to Inception Labs, Mercury Coder is not just different—it’s dramatically faster. The company claims the model can generate over 1,000 tokens per second on NVIDIA H100s, a rate that typically requires specialized hardware accelerators like Groq or Cerebras. The approach also reduces computational costs, making it a compelling option for enterprises looking to optimize AI infrastructure.
Early benchmarks suggest Mercury Coder’s performance holds up against leading LLMs. In head-to-head coding evaluations, the model matched or outperformed speed-optimized models like OpenAI’s GPT-4o Mini and Anthropic’s Claude 3.5 Haiku while running at a fraction of their latency. If these results are consistent in real-world applications, dLLMs could offer a viable alternative to traditional LLMs, particularly in scenarios requiring high-speed responses, such as customer support, code generation, and enterprise automation.
Industry leaders are taking notice. AI researcher Andrej Karpathy noted that Mercury Coder’s diffusion approach is an intriguing departure from the norm, stating, “It’s been a mystery why text generation has resisted diffusion while image and video generation have embraced it. This model could reveal new strengths and weaknesses in AI text generation.”
For now, Inception Labs is positioning Mercury Coder as a drop-in alternative for existing models, offering API access and on-premise deployments. The company is already working with Fortune 100 enterprises looking to reduce AI latency and cost. Inception also hints at future dLLM releases, including models optimized for conversational AI.
Whether diffusion-based LLMs become a serious competitor to traditional models remains to be seen. But with Mercury Coder, Inception Labs is making a compelling case that AI text generation doesn’t have to be limited by the sequential architecture in today’s dominant models.