Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies
University of California, San Francisco · University of Zurich · +1 more institution
Abstract
Due to the rapid release of new data from genome sequencing projects, the majority of protein sequences in public databases have not been experimentally characterized; rather, sequences are annotated using computational analysis. The level of misannotation and the types of misannotation in large public databases are currently unknown and have not been analyzed in depth. We have investigated the misannotation levels for molecular function in four public protein sequence databases (UniProtKB/Swiss-Prot, GenBank NR, UniProtKB/TrEMBL, and KEGG) for a model set of 37 enzyme families for which extensive experimental information is available. The manually curated database Swiss-Prot shows the lowest annotation error…
Citation impact
- FWCI
- 14.64
- Percentile
- 100%
- References
- 75
Authors
4Topics & keywords
- UniProt
- GenBank
- KEGG
- Annotation
- Database
- Sequence database
- Biology
- Function (biology)
- Partnerships for the goals