UniRef: comprehensive and non-redundant UniProt reference clusters

Süzek, Barış Ethem; Huang, Hongzhan; McGarvey, Peter B.; Mazumder, Raja; Wu, Cathy

doi:10.1093/bioinformatics/btm098

articleBioinformaticsMar 22, 2007HYBRID OA

UniRef: comprehensive and non-redundant UniProt reference clusters

BEBarış Ethem Süzek HHHongzhan Huang PBPeter B. McGarvey RMRaja Mazumder CWCathy Wu

Georgetown University · Georgetown University Medical Center

PubMed

Indexed incrossrefdoajpubmed

Abstract

MOTIVATION: Redundant protein sequences in biological databases hinder sequence similarity searches and make interpretation of search results difficult. Clustering of protein sequence space based on sequence similarity helps organize all sequences into manageable datasets and reduces sampling bias and overrepresentation of sequences. RESULTS: The UniRef (UniProt Reference Clusters) provide clustered sets of sequences from the UniProt Knowledgebase (UniProtKB) and selected UniProt Archive records to obtain complete coverage of sequence space at several resolutions while hiding redundant sequences. Currently covering >4 million source sequences, the UniRef100 database combines identical sequences and…

Citation impact

1,679

total citations

FWCI: 13.45
Percentile: 100%
References: 53

Citations per year

Authors

5

Topics & keywords

Topics

Keywords

UniProt
Cluster analysis
RefSeq
Computer science
Annotation
Sequence (biology)
Sequence database
Similarity (geometry)

No related works found for this paper.

Funding

EB
European Bioinformatics Institute