preprintbioRxiv (Cold Spring Harbor Laboratory)Jan 15, 2023GREEN OA

The Nucleotide Transformer: Building and Evaluating Robust Foundation Models for Human Genomics

Nvidia (United Kingdom)

Indexed incrossref

Abstract

Closing the gap between measurable genetic information and observable traits is a longstanding challenge in genomics. Yet, the prediction of molecular phenotypes from DNA sequences alone remains limited and inaccurate, often driven by the scarcity of annotated data and the inability to transfer learnings between prediction tasks. Here, we present an extensive study of foundation models pre-trained on DNA sequences, named the Nucleotide Transformer, ranging from 50M up to 2.5B parameters and integrating information from 3,202 diverse human genomes, as well as 850 genomes selected across diverse phyla, including both model and non-model organisms. These transformer models yield transferable, context-specific…

Citation impact

193
total citations
FWCI
Percentile
References
55
Citations per year

Authors

15

Topics & keywords

Keywords
  • Genomics
  • Computational biology
  • Computer science
  • Transformer
  • Genome
  • Biology
  • Gene
  • Genetics
No related works found for this paper.