articleNature BiotechnologyMay 23, 2025HYBRID OA

Self-supervised learning of molecular representations from millions of tandem mass spectra using DreaMS

Czech Academy of Sciences, Institute of Organic Chemistry and Biochemistry · Czech Technical University in Prague · +1 more institution

PubMed
Indexed incrossrefpubmed

Abstract

Characterizing biological and environmental samples at a molecular level primarily uses tandem mass spectroscopy (MS/MS), yet the interpretation of tandem mass spectra from untargeted metabolomics experiments remains a challenge. Existing computational methods for predictions from mass spectra rely on limited spectral libraries and on hard-coded human expertise. Here we introduce a transformer-based neural network pre-trained in a self-supervised way on millions of unannotated tandem mass spectra from our GNPS Experimental Mass Spectra (GeMS) dataset mined from the MassIVE GNPS repository. We show that pre-training our model to predict masked spectral peaks and chromatographic retention orders leads to the…

No related works found for this paper.