UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches
SIB Swiss Institute of Bioinformatics · European Bioinformatics Institute · +3 more institutions
Abstract
Abstract Motivation: UniRef databases provide full-scale clustering of UniProtKB sequences and are utilized for a broad range of applications, particularly similarity-based functional annotation. Non-redundancy and intra-cluster homogeneity in UniRef were recently improved by adding a sequence length overlap threshold. Our hypothesis is that these improvements would enhance the speed and sensitivity of similarity searches and improve the consistency of annotation within clusters. Results: Intra-cluster molecular function consistency was examined by analysis of Gene Ontology terms. Results show that UniRef clusters bring together proteins of identical molecular function in more than 97% of the clusters,…
Citation impact
- FWCI
- 17.78
- Percentile
- 100%
- References
- 34
Authors
5- BEBarış Ethem SüzekCorresponding
SIB Swiss Institute of Bioinformatics, European Bioinformatics Institute, Georgetown University, Georgetown University Medical Center, Muğla University
- YWYuqi Wang
SIB Swiss Institute of Bioinformatics, European Bioinformatics Institute, Georgetown University, Georgetown University Medical Center, Muğla University
- HHHongzhan Huang
SIB Swiss Institute of Bioinformatics, European Bioinformatics Institute, Georgetown University, Georgetown University Medical Center, Muğla University
- PBPeter B. McGarvey
SIB Swiss Institute of Bioinformatics, European Bioinformatics Institute, Georgetown University, Georgetown University Medical Center, Muğla University
- CWCathy Wu
SIB Swiss Institute of Bioinformatics, European Bioinformatics Institute, Georgetown University, Georgetown University Medical Center, Muğla University
Topics & keywords
- Annotation
- UniProt
- Computer science
- Scalability
- Consistency (knowledge bases)
- Cluster analysis
- Similarity (geometry)
- Data mining