Uniclust databases of clustered and deeply annotated protein sequences and alignments
Max Planck Institute for Biophysical Chemistry · Wellcome Trust · +3 more institutions
Abstract
We present three clustered protein sequence databases, Uniclust90, Uniclust50, Uniclust30 and three databases of multiple sequence alignments (MSAs), Uniboost10, Uniboost20 and Uniboost30, as a resource for protein sequence analysis, function prediction and sequence searches. The Uniclust databases cluster UniProtKB sequences at the level of 90%, 50% and 30% pairwise sequence identity. Uniclust90 and Uniclust50 clusters showed better consistency of functional annotation than those of UniRef90 and UniRef50, owing to an optimised clustering pipeline that runs with our MMseqs2 software for fast and sensitive protein sequence searching and clustering. Uniclust sequences are annotated with matches to Pfam, SCOP…
Citation impact
- FWCI
- 8.88
- Percentile
- 100%
- References
- 26
Authors
6- MMMilot Mirdita
Max Planck Institute for Biophysical Chemistry
- LVLars von den Driesch
Max Planck Institute for Biophysical Chemistry, Wellcome Trust, European Bioinformatics Institute
- CGClovis Galiez
Max Planck Institute for Biophysical Chemistry
- MMMaría Martin
European Bioinformatics Institute, Wellcome Trust
- JSJohannes Söding
Max Planck Institute for Biophysical Chemistry
Topics & keywords
- UniProt
- Sequence database
- Biology
- Cluster analysis
- Sequence alignment
- Annotation
- Sequence (biology)
- Multiple sequence alignment