articleNatureSep 13, 2023HYBRID OA

Clustering predicted structures at the scale of the known protein universe

European Bioinformatics Institute · Seoul National University · +5 more institutions

PubMed
Indexed incrossrefpubmed

Abstract

Abstract Proteins are key to all cellular processes and their structure is important in understanding their function and evolution. Sequence-based predictions of protein structures have increased in accuracy 1 , and over 214 million predicted structures are available in the AlphaFold database 2 . However, studying protein structures at this scale requires highly efficient methods. Here, we developed a structural-alignment-based clustering algorithm—Foldseek cluster—that can cluster hundreds of millions of structures. Using this method, we have clustered all of the structures in the AlphaFold database, identifying 2.30 million non-singleton structural clusters, of which 31% lack annotations representing…

Citation impact

357
total citations
FWCI
51.64
Percentile
100%
References
65
Citations per year

Authors

10

Topics & keywords

Keywords
  • Cluster analysis
  • Structural similarity
  • Computational biology
  • Similarity (geometry)
  • Structural Classification of Proteins database
  • Protein structure database
  • Protein domain
  • Computer science
UN Sustainable Development Goals
  • Life in Land
No related works found for this paper.

Funding