NVIDIA and Arc Institute Introduce Evo 2, A State of the Art Foundation Model for Biology

NVIDIA and Arc Institute Introduce Evo 2, A State of the Art Foundation Model for Biology

Today, the Arc Institute and NVIDIA introduced Evo 2, a 40-billion parameter AI model trained on an expansive dataset of 9.3 trillion DNA base pairs from over 128,000 species—covering every domain of life. Its ability to analyze and generate DNA sequences marks a significant advancement in synthetic biology and genome design.

Key Points

  • Trained on 9.3 trillion DNA base pairs from more than 128,000 species.
  • Utilizes the novel StripedHyena 2 architecture for efficient, large-scale processing.
  • Fully open-source and accessible via NVIDIA’s BioNeMo platform and DGX Cloud on AWS.
  • Demonstrates robust performance in mutation impact analysis and synthetic biological design.

Behind this achievement is the new StripedHyena 2 architecture. Departing from traditional Transformer models, StripedHyena 2 employs convolutional multi-hybrid techniques that not only accelerate training but also reduce perplexity when handling lengthy genomic sequences. This architectural breakthrough makes it possible to connect distant genetic signals, a capability that could enhance everything from precision medicine to agricultural biotechnology.

Scientists have already begun to explore Evo 2’s potential. In early tests, the model demonstrated state-of-the-art accuracy in classifying variants of the BRCA1 gene—a key player in breast cancer—predicting with 90% accuracy the impact of previously uncharacterized mutations. “Evo 2 represents a major milestone for generative genomics,” said Patrick Hsu, Arc Institute cofounder and bioengineering professor at UC Berkeley . His remarks underscore the model’s promise not only for academic research but also for practical applications in healthcare and environmental science.

Accessible through NVIDIA’s BioNeMo platform and supported by NVIDIA DGX Cloud on AWS, Evo 2 is available as an open-source tool. This accessibility is intended to democratize high-end genomic research, allowing developers and researchers worldwide to tap into a resource once confined to elite laboratories. The model’s release comes with comprehensive documentation, complete training codebases, and a suite of inference tools designed to facilitate its integration into diverse scientific workflows.

While Evo 2 opens the door to new insights and faster discovery cycles, it also brings a host of ethical, safety, and technical considerations. Researchers will need to carefully weigh the implications of generating and manipulating genomic sequences, ensuring that advancements in synthetic biology are paired with robust safeguards.

As biology increasingly becomes a computational science, tools like Evo 2 represent a shift toward more systematic approaches to understanding and engineering biological systems. Whether this leads to more resilient crops, novel therapeutics, or solutions for environmental challenges remains to be seen. What's clear is that the intersection of AI and biology continues to be a space worth watching closely.

Chris McKay is the founder and chief editor of Maginative. His thought leadership in AI literacy and strategic AI adoption has been recognized by top academic institutions, media, and global brands.

Let’s stay in touch. Get the latest AI news from Maginative in your inbox.

Subscribe