articlePLoS Computational BiologyDec 10, 2009GOLD OA

Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies

University of California, San Francisco · University of Zurich · +1 more institution

PubMed
Indexed incrossrefdoajpubmed

Abstract

Due to the rapid release of new data from genome sequencing projects, the majority of protein sequences in public databases have not been experimentally characterized; rather, sequences are annotated using computational analysis. The level of misannotation and the types of misannotation in large public databases are currently unknown and have not been analyzed in depth. We have investigated the misannotation levels for molecular function in four public protein sequence databases (UniProtKB/Swiss-Prot, GenBank NR, UniProtKB/TrEMBL, and KEGG) for a model set of 37 enzyme families for which extensive experimental information is available. The manually curated database Swiss-Prot shows the lowest annotation error…

Citation impact

711
total citations
FWCI
14.64
Percentile
100%
References
75
Citations per year

Authors

4

Topics & keywords

Keywords
  • UniProt
  • GenBank
  • KEGG
  • Annotation
  • Database
  • Sequence database
  • Biology
  • Function (biology)
UN Sustainable Development Goals
  • Partnerships for the goals
No related works found for this paper.

Funding