Exploring structural diversity across the protein universe with The Encyclopedia of Domains
University College London · Institute of Structural and Molecular Biology
Abstract
The AlphaFold Protein Structure Database (AFDB) contains more than 214 million predicted protein structures composed of domains, which are independently folding units found in multiple structural and functional contexts. Identifying domains can enable many functional and evolutionary analyses but has remained challenging because of the sheer scale of the data. Using deep learning methods, we have detected and classified every domain in the AFDB, producing The Encyclopedia of Domains. We detected nearly 365 million domains, over 100 million more than can be found by sequence methods, covering more than 1 million taxa. Reassuringly, 77% of the nonredundant domains are similar to known superfamilies, greatly…
Citation impact
- FWCI
- 23.61
- Percentile
- 100%
- References
- 55
Authors
8- AMAndy M. LauCorresponding
University College London
- NBNicola BordinCorresponding
Institute of Structural and Molecular Biology, University College London
- SMShaun M. Kandathil
University College London
- ISIan Sillitoe
Institute of Structural and Molecular Biology, University College London
- VWVaishali Waman
Institute of Structural and Molecular Biology, University College London
Topics & keywords
- Encyclopedia
- Domain (mathematical analysis)
- Protein domain
- Evolutionary biology
- Sequence (biology)
- Computational biology
- Biology
- Computer science