Open-Source Evo 2 AI Maps Genome Features Across Life

Ars Technica reports Evo 2, an open-source large genome model trained on 8.8 trillion bases from bacteria, archaea, eukaryotes, and related viruses, enabling it to identify genes, regulatory DNA, and splice sites without task-specific tuning. Built on a StripedHyena 2 CNN, Evo 2 underwent two training stages—short, feature-rich segments then long-range sequences—and was released with model weights, training/inference code, and the OpenGenome2 dataset. While it shows strong genome-annotation capabilities and can recognize features across domains and some mutation effects, its ability to design functional new proteins remains unproven and early tests of regulatory sequence activity yielded only modest results. The researchers anticipate many possible uses and further specialization, with the code and data open for community exploration.
- Large genome model: Open source AI trained on trillions of bases Ars Technica
- Genome modelling and design across all domains of life with Evo 2 Nature
- With Evo 2, AI can model and design the genetic code for all domains of life Phys.org
- Evo 2: The AI That Learned to Read DNA Across All Life on Earth — And It's Already Finding Things We Missed Technology Org
- AI model trained on 9.3 trillion base pairs can now design novel genes 동아사이언스
Reading Insights
1
5
10 min
vs 11 min read
95%
2,159 → 117 words
Want the full story? Read the original article
Read on Ars Technica