Comparison and evaluation of statistical error models for scRNA-seq

Choudhary, Saket; Satija, Rahul

doi:10.1186/s13059-021-02584-9

articleGenome biologyJan 18, 2022GOLD OA

Comparison and evaluation of statistical error models for scRNA-seq

SCSaket ChoudharyRSRahul Satija

New York Genome Center

PubMed

Indexed incrossrefdoajpubmed

Abstract

Background

Heterogeneity in single-cell RNA-seq (scRNA-seq) data is driven by multiple sources, including biological variation in cellular state as well as technical variation introduced during experimental processing. Deconvolving these effects is a key challenge for preprocessing workflows. Recent work has demonstrated the importance and utility of count models for scRNA-seq analysis, but there is a lack of consensus on which statistical distributions and parameter settings are appropriate.

Results

Here, we analyze 59 scRNA-seq datasets that span a wide range of technologies, systems, and sequencing depths in order to evaluate the performance of different error models. We find that while a Poisson error model appears appropriate for sparse datasets, we observe clear evidence of overdispersion for genes with sufficient sequencing depth in all biological systems, necessitating the use of a negative binomial model. Moreover, we find that the degree of overdispersion varies widely across datasets, systems, and gene abundances, and argues for a data-driven approach for parameter estimation.

Citation impact

652

total citations

FWCI: 48.77
Percentile: 100%
References: 78

Citations per year

Authors

2

SC
Saket ChoudharyCorresponding
New York Genome Center
RS
Rahul Satija
New York Genome Center

Topics & keywords

Topics

Keywords

Workflow
Preprocessor
Biology
Variation (astronomy)
RNA-Seq
Key (lock)
Computational biology
Statistical model

No related works found for this paper.