Self-supervised learning of molecular representations from millions of tandem mass spectra using DreaMS
Czech Academy of Sciences, Institute of Organic Chemistry and Biochemistry · Czech Technical University in Prague · +1 more institution
Abstract
Characterizing biological and environmental samples at a molecular level primarily uses tandem mass spectroscopy (MS/MS), yet the interpretation of tandem mass spectra from untargeted metabolomics experiments remains a challenge. Existing computational methods for predictions from mass spectra rely on limited spectral libraries and on hard-coded human expertise. Here we introduce a transformer-based neural network pre-trained in a self-supervised way on millions of unannotated tandem mass spectra from our GNPS Experimental Mass Spectra (GeMS) dataset mined from the MassIVE GNPS repository. We show that pre-training our model to predict masked spectral peaks and chromatographic retention orders leads to the…
Citation impact
- FWCI
- 27.73
- Percentile
- 100%
- References
- 103
Authors
6- RBRoman BushuievCorresponding
Czech Academy of Sciences, Institute of Organic Chemistry and Biochemistry, Czech Technical University in Prague
- ABAnton Bushuiev
Czech Technical University in Prague
- RSRaman Samusevich
Czech Academy of Sciences, Institute of Organic Chemistry and Biochemistry, Czech Technical University in Prague
- CBCorinna Brungs
Czech Academy of Sciences, Institute of Organic Chemistry and Biochemistry
- JŠJosef Šivic
Czech Technical University in Prague
Topics & keywords
- Mass spectrum
- Tandem
- Computer science
- Tandem mass spectrometry
- Artificial intelligence
- Annotation
- Spectral line
- Artificial neural network