CPAT: Coding-Potential Assessment Tool using an alignment-free logistic regression model
Mayo Clinic · Southeast University · +1 more institution
Abstract
Thousands of novel transcripts have been identified using deep transcriptome sequencing. This discovery of large and 'hidden' transcriptome rejuvenates the demand for methods that can rapidly distinguish between coding and noncoding RNA. Here, we present a novel alignment-free method, Coding Potential Assessment Tool (CPAT), which rapidly recognizes coding and noncoding transcripts from a large pool of candidates. To this end, CPAT uses a logistic regression model built with four sequence features: open reading frame size, open reading frame coverage, Fickett TESTCODE statistic and hexamer usage bias. CPAT software outperformed (sensitivity: 0.96, specificity: 0.97) other state-of-the-art alignment-based…
Citation impact
- FWCI
- 24.25
- Percentile
- 100%
- References
- 49
Authors
6- LWLiguo WangCorresponding
Mayo Clinic, Southeast University, Baylor College of Medicine
- HJHyun Jung Park
Southeast University, Mayo Clinic, Baylor College of Medicine
- SDSurendra Dasari
Baylor College of Medicine, Southeast University, Mayo Clinic
- SWShengqin Wang
Southeast University, Mayo Clinic, Baylor College of Medicine
- JKJean‐Pierre Kocher
Baylor College of Medicine, Mayo Clinic, Southeast University
Topics & keywords
- Biology
- Open reading frame
- Coding (social sciences)
- Software
- Computational biology
- Calculator
- Python (programming language)
- Coding region
- Quality Education