Sequence modeling and design from molecular to genome scale with Evo
Palo Alto Institute · Arc Research Institute · +5 more institutions
Abstract
The genome is a sequence that encodes the DNA, RNA, and proteins that orchestrate an organism's function. We present Evo, a long-context genomic foundation model with a frontier architecture trained on millions of prokaryotic and phage genomes, and report scaling laws on DNA to complement observations in language and vision. Evo generalizes across DNA, RNA, and proteins, enabling zero-shot function prediction competitive with domain-specific language models and the generation of functional CRISPR-Cas and transposon systems, representing the first examples of protein-RNA and protein-DNA codesign with a language model. Evo also learns how small mutations affect whole-organism fitness and generates megabase-scale…
Citation impact
- FWCI
- 83.07
- Percentile
- 100%
- References
- 136
Authors
20- ENEric NguyenCorresponding
Palo Alto Institute, Arc Research Institute, Stanford University
- MPMichael PoliCorresponding
Together, Stanford University
- MGMatthew G. DurrantCorresponding
Palo Alto Institute, Arc Research Institute
- BKBrian KangCorresponding
Palo Alto Institute, Arc Research Institute, Stanford University
- DKDhruva KatrekarCorresponding
Palo Alto Institute, Arc Research Institute
Topics & keywords
- Biology
- Genome
- Computational biology
- DNA sequencing
- Transposable element
- Genetics
- Context (archaeology)
- RNA