articleGenome biologyAug 7, 2006GOLD OA

GENCODE: producing a reference annotation for ENCODE

Wellcome Sanger Institute · Universitat Pompeu Fabra · +3 more institutions

PubMed
Indexed incrossrefdoajpubmed

Abstract

Background

The GENCODE consortium was formed to identify and map all protein-coding genes within the ENCODE regions. This was achieved by a combination of initial manual annotation by the HAVANA team, experimental validation by the GENCODE consortium and a refinement of the annotation based on these experimental results.

Results

The GENCODE gene features are divided into eight different categories of which only the first two (known and novel coding sequence) are confidently predicted to be protein-coding genes. 5' rapid amplification of cDNA ends (RACE) and RT-PCR were used to experimentally verify the initial annotation. Of the 420 coding loci tested, 229 RACE products have been sequenced. They supported 5' extensions of 30 loci and new splice variants in 50 loci. In addition, 46 loci without evidence for a coding sequence were validated, consisting of 31 novel and 15 putative transcripts. We assessed the comprehensiveness of the GENCODE annotation by attempting to validate all the predicted exon boundaries outside the GENCODE annotation. Out of 1,215 tested in a subset of the ENCODE regions, 14 novel exon pairs were validated, only two of them in intergenic regions.

Citation impact

649
total citations
FWCI
11.59
Percentile
100%
References
39
Citations per year

Authors

15

Topics & keywords

Keywords
  • ENCODE
  • Biology
  • Human genetics
  • Genome Biology
  • Annotation
  • Computational biology
  • Evolutionary biology
  • Genetics
No related works found for this paper.

Funding