
Remember how DeepMind's AlphaFold cracked the protein folding problem and nabbed a Nobel Prize? Well, they're back with an even gnarlier challenge: figuring out what the hell 98% of your DNA actually does. Meet AlphaGenome, the AI that wants about to make sense of our genetic code.
Key Points:
- AlphaGenome predicts how mutations in non-coding DNA affect gene regulation across cell types.
- It outperforms existing models on 24 out of 26 genomic tasks, including splicing and chromatin accessibility.
- The model is available via API for researchers and may accelerate discovery of disease-causing variants.
Scientists have been able to read our entire genetic code for over two decades, but they still don't know what most of it does. It's like having the complete works of Shakespeare in a language nobody speaks. AlphaGenome more comprehensively and accurately predicts how single variants or genetic mutations—especially in non-coding regions of DNA—impact gene regulation.
Why does this matter? Because most genetic variants linked to disease lie outside protein-coding genes. Instead, they affect how genes are regulated: where they’re turned on or off, how RNA is spliced, or which proteins can bind DNA. Previous models tried to capture these effects, but they hit tradeoffs—long sequence input or high resolution, not both; broad modality coverage or specialized accuracy, rarely all at once.
It's also worth noting that only about 2% of the human genome codes for proteins — this is what AlphaFold excels at predicting. The remaining 98%, called non-coding regions, are crucial for orchestrating gene activity and contain many variants linked to diseases. Scientists have been calling this the "dark matter" of the genome, and until now, we've been mostly stumbling around in the dark.
AlphaGenome addresses this with a hybrid neural network that combines convolutional layers and transformers. It digests up to one million base pairs of DNA at once and delivers base-pair-level predictions on gene expression, splicing, chromatin accessibility, transcription factor binding, and 3D genome structure.
"We have, for the first time, created a single model that unifies many different challenges that come with understanding the genome," says Pushmeet Kohli, a vice president for research at DeepMind. And he's not exaggerating. In performance tests, AlphaGenome beat or matched the best external models on 24 out of 26 variant effect prediction benchmarks. That includes outperforming specialized models like SpliceAI and ChromBPNet at their own tasks. It’s also fast—scoring a variant takes less than a second on an H100 GPU.
What’s especially novel is its handling of splice junctions—those cut-and-paste points in RNA that, when misfired, can cause diseases like spinal muscular atrophy. AlphaGenome predicts not just if a splice site exists, but how it’s used and how mutations affect that usage. It even matched known disease mechanisms, like a cancer-associated mutation that activates the TAL1 gene by inserting a MYB binding site.
DeepMind is offering AlphaGenome through a preview API for non-commercial research. That opens the door for labs around the world to score variants, simulate edits, and probe disease mechanisms faster than ever. A full model release is planned later.
The technical achievements are impressive, sure, but what actually matters is what this means for real people with real medical problems. Dr. Caleb Lareau, a researcher at Memorial Sloan Kettering Cancer Center, called it "a milestone for the field. For the first time, we have a single model that unifies long-range context, base-level precision and state-of-the-art performance across a whole spectrum of genomic tasks".
Of course, we're not there yet. Understanding how genetic variations lead to complex traits or diseases, which often involve broader biological processes like developmental and environmental factors, is beyond the direct scope of the model. AlphaGenome can tell you that a mutation will dial up gene expression, but it can't predict if that'll give you diabetes or just make you really good at metabolizing caffeine.
Still, for a field that's been stuck reading genetic tea leaves, this is huge. "This is one of the most fundamental problems not just in biology — in all of science," Pushmeet Kohli, the company's head of AI for science said at a press briefing.
For now, AlphaGenome gives researchers something they’ve never had: a single, state-of-the-art model to predict how a stretch of DNA might affect everything from gene splicing to chromatin looping. It’s like having a digital microscope for the genome—except this one sees the invisible instructions.