GC-Content Normalization for RNA-Seq Data
University of Padua · University of California, Berkeley · +1 more institution
Abstract
Transcriptome sequencing (RNA-Seq) has become the assay of choice for high-throughput studies of gene expression. However, as is the case with microarrays, major technology-related artifacts and biases affect the resulting expression measures. Normalization is therefore essential to ensure accurate inference of expression levels and subsequent analyses thereof.
We focus on biases related to GC-content and demonstrate the existence of strong sample-specific GC-content effects on RNA-Seq read counts, which can substantially bias differential expression analysis. We propose three simple within-lane gene-level GC-content normalization approaches and assess their performance on two different RNA-Seq datasets, involving different species and experimental designs. Our methods are compared to state-of-the-art normalization procedures in terms of bias and mean squared error for expression fold-change estimation and in terms of Type I error and p-value distributions for tests of differential expression. The exploratory data analysis and normalization methods proposed in this article are implemented in the open-source Bioconductor R package EDASeq.
Citation impact
- FWCI
- 13.30
- Percentile
- 100%
- References
- 35
Authors
4Topics & keywords
- Normalization (sociology)
- RNA-Seq
- DNA microarray
- Bioconductor
- Inference
- Transcriptome
- Computer science
- Computational biology