The Pfam protein families database: embracing AI/ML
European Bioinformatics Institute · Google (United States) · +8 more institutions
Abstract
The Pfam protein families database is a comprehensive collection of protein domains and families used for genome annotation and protein structure and function analysis (https://www.ebi.ac.uk/interpro/). This update describes major developments in Pfam since 2020, including decommissioning the Pfam website and integration with InterPro, harmonization with the ECOD structural classification, and expanded curation of metagenomic, microprotein and repeat-containing families. We highlight how AlphaFold structure predictions are being leveraged to refine domain boundaries and identify new domains. New families discovered through large-scale sequence similarity analysis of AlphaFold models are described. We also…
Citation impact
- FWCI
- 50.67
- Percentile
- 100%
- References
- 30
Authors
18Topics & keywords
- UniProt
- Biology
- Computational biology
- Structural genomics
- Protein domain
- Annotation
- Protein family
- RefSeq