articleBMC BioinformaticsDec 1, 2011GOLD OA

GC-Content Normalization for RNA-Seq Data

University of Padua · University of California, Berkeley · +1 more institution

PubMed
Indexed incrossrefdoajpubmed

Abstract

Background

Transcriptome sequencing (RNA-Seq) has become the assay of choice for high-throughput studies of gene expression. However, as is the case with microarrays, major technology-related artifacts and biases affect the resulting expression measures. Normalization is therefore essential to ensure accurate inference of expression levels and subsequent analyses thereof.

Results

We focus on biases related to GC-content and demonstrate the existence of strong sample-specific GC-content effects on RNA-Seq read counts, which can substantially bias differential expression analysis. We propose three simple within-lane gene-level GC-content normalization approaches and assess their performance on two different RNA-Seq datasets, involving different species and experimental designs. Our methods are compared to state-of-the-art normalization procedures in terms of bias and mean squared error for expression fold-change estimation and in terms of Type I error and p-value distributions for tests of differential expression. The exploratory data analysis and normalization methods proposed in this article are implemented in the open-source Bioconductor R package EDASeq.

Citation impact

952
total citations
FWCI
13.30
Percentile
100%
References
35
Citations per year

Authors

4

Topics & keywords

Keywords
  • Normalization (sociology)
  • RNA-Seq
  • DNA microarray
  • Bioconductor
  • Inference
  • Transcriptome
  • Computer science
  • Computational biology
No related works found for this paper.

Funding