preprintbioRxiv (Cold Spring Harbor Laboratory)Feb 21, 2025GREEN OA

Genome modeling and design across all domains of life with Evo 2

Arc Research Institute · Stanford University · +6 more institutions

Indexed incrossref

Abstract

Abstract All of life encodes information with DNA. While tools for sequencing, synthesis, and editing of genomic code have transformed biological research, intelligently composing new biological systems would also require a deep understanding of the immense complexity encoded by genomes. We introduce Evo 2, a biological foundation model trained on 9.3 trillion DNA base pairs from a highly curated genomic atlas spanning all domains of life. We train Evo 2 with 7B and 40B parameters to have an unprecedented 1 million token context window with single-nucleotide resolution. Evo 2 learns from DNA sequence alone to accurately predict the functional impacts of genetic variation—from noncoding pathogenic mutations to…

No related works found for this paper.