The Nucleotide Transformer: Building and Evaluating Robust Foundation Models for Human Genomics
HDHugo Dalla-TorreLGLiam GonzalezJMJavier Mendoza‐RevillaNLNicolás López CarranzaAHAdam Henryk Grywaczewski
Indexed incrossref
Abstract
Closing the gap between measurable genetic information and observable traits is a longstanding challenge in genomics. Yet, the prediction of molecular phenotypes from DNA sequences alone remains limited and inaccurate, often driven by the scarcity of annotated data and the inability to transfer learnings between prediction tasks. Here, we present an extensive study of foundation models pre-trained on DNA sequences, named the Nucleotide Transformer, ranging from 50M up to 2.5B parameters and integrating information from 3,202 diverse human genomes, as well as 850 genomes selected across diverse phyla, including both model and non-model organisms. These transformer models yield transferable, context-specific…
Citation impact
193
total citations
- FWCI
- —
- Percentile
- —
- References
- 55
Citations per year
Authors
15Topics & keywords
Topics
Keywords
- Genomics
- Computational biology
- Computer science
- Transformer
- Genome
- Biology
- Gene
- Genetics
No related works found for this paper.