articleBMC BioinformaticsFeb 18, 2010GOLD OA

Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments

University of California, Berkeley

PubMed
Indexed incrossrefdoajpubmed

Abstract

Background

High-throughput sequencing technologies, such as the Illumina Genome Analyzer, are powerful new tools for investigating a wide range of biological and medical questions. Statistical and computational methods are key for drawing meaningful and accurate conclusions from the massive and complex datasets generated by the sequencers. We provide a detailed evaluation of statistical methods for normalization and differential expression (DE) analysis of Illumina transcriptome sequencing (mRNA-Seq) data.

Results

We compare statistical methods for detecting genes that are significantly DE between two types of biological samples and find that there are substantial differences in how the test statistics handle low-count genes. We evaluate how DE results are affected by features of the sequencing platform, such as, varying gene lengths, base-calling calibration method (with and without phi X control lane), and flow-cell/library preparation effects. We investigate the impact of the read count normalization method on DE results and show that the standard approach of scaling by total lane counts (e.g., RPKM) can bias estimates of DE. We propose more general quantile-based normalization procedures and demonstrate an improvement in DE detection.

Citation impact

1,783
total citations
FWCI
43.69
Percentile
100%
References
23
Citations per year

Authors

4

Topics & keywords

Keywords
  • Normalization (sociology)
  • Computational biology
  • Computer science
  • DNA microarray
  • Statistical inference
  • Statistical hypothesis testing
  • RNA-Seq
  • Data mining
No related works found for this paper.

Funding