Genome modeling and design across all domains of life with Evo 2
Arc Research Institute · Stanford University · +6 more institutions
Abstract
Abstract All of life encodes information with DNA. While tools for sequencing, synthesis, and editing of genomic code have transformed biological research, intelligently composing new biological systems would also require a deep understanding of the immense complexity encoded by genomes. We introduce Evo 2, a biological foundation model trained on 9.3 trillion DNA base pairs from a highly curated genomic atlas spanning all domains of life. We train Evo 2 with 7B and 40B parameters to have an unprecedented 1 million token context window with single-nucleotide resolution. Evo 2 learns from DNA sequence alone to accurately predict the functional impacts of genetic variation—from noncoding pathogenic mutations to…
Citation impact
- FWCI
- —
- Percentile
- —
- References
- 109
Authors
52Topics & keywords
- Evolutionary biology
- Computational biology
- Biology