preprintbioRxiv (Cold Spring Harbor Laboratory)Apr 10, 2022GREEN OA

Learning inverse folding from millions of predicted structures

Meta (United States) · University of California, Berkeley · +2 more institutions

Indexed incrossref

Abstract

Abstract We consider the problem of predicting a protein sequence from its backbone atom coordinates. Machine learning approaches to this problem to date have been limited by the number of available experimentally determined protein structures. We augment training data by nearly three orders of magnitude by predicting structures for 12M protein sequences using AlphaFold2. Trained with this additional data, a sequence-to-sequence transformer with invariant geometric input processing layers achieves 51% native sequence recovery on structurally held-out backbones with 72% recovery for buried residues, an overall improvement of almost 10 percentage points over existing methods. The model generalizes to a variety…

Citation impact

415
total citations
FWCI
Percentile
References
73
Citations per year

Authors

8

Topics & keywords

Keywords
  • Sequence (biology)
  • Computer science
  • Protein design
  • Invariant (physics)
  • Algorithm
  • Inverse
  • Protein folding
  • Protein structure
No related works found for this paper.