articleBioinformaticsMar 22, 2007HYBRID OA

UniRef: comprehensive and non-redundant UniProt reference clusters

Georgetown University · Georgetown University Medical Center

PubMed
Indexed incrossrefdoajpubmed

Abstract

MOTIVATION: Redundant protein sequences in biological databases hinder sequence similarity searches and make interpretation of search results difficult. Clustering of protein sequence space based on sequence similarity helps organize all sequences into manageable datasets and reduces sampling bias and overrepresentation of sequences. RESULTS: The UniRef (UniProt Reference Clusters) provide clustered sets of sequences from the UniProt Knowledgebase (UniProtKB) and selected UniProt Archive records to obtain complete coverage of sequence space at several resolutions while hiding redundant sequences. Currently covering >4 million source sequences, the UniRef100 database combines identical sequences and…

No related works found for this paper.

Funding