Learning inverse folding from millions of predicted structures
Meta (United States) · University of California, Berkeley · +2 more institutions
Abstract
Abstract We consider the problem of predicting a protein sequence from its backbone atom coordinates. Machine learning approaches to this problem to date have been limited by the number of available experimentally determined protein structures. We augment training data by nearly three orders of magnitude by predicting structures for 12M protein sequences using AlphaFold2. Trained with this additional data, a sequence-to-sequence transformer with invariant geometric input processing layers achieves 51% native sequence recovery on structurally held-out backbones with 72% recovery for buried residues, an overall improvement of almost 10 percentage points over existing methods. The model generalizes to a variety…
Citation impact
- FWCI
- —
- Percentile
- —
- References
- 73
Authors
8Topics & keywords
- Sequence (biology)
- Computer science
- Protein design
- Invariant (physics)
- Algorithm
- Inverse
- Protein folding
- Protein structure