CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine
Indexed incrossrefdoajpubmed
Abstract
Recent transcriptome studies have revealed that a large number of transcripts in mammals and other organisms do not encode proteins but function as noncoding RNAs (ncRNAs) instead. As millions of transcripts are generated by large-scale cDNA and EST sequencing projects every year, there is a need for automatic methods to distinguish protein-coding RNAs from noncoding RNAs accurately and quickly. We developed a support vector machine-based classifier, named Coding Potential Calculator (CPC), to assess the protein-coding potential of a transcript based on six biologically meaningful sequence features. Tenfold cross-validation on the training dataset and further testing on several large datasets showed that CPC…
Citation impact
2,999
total citations
- FWCI
- 2.69
- Percentile
- 100%
- References
- 32
Citations per year
Authors
7Topics & keywords
Topics
Keywords
- Biology
- ENCODE
- Computational biology
- Support vector machine
- Coding region
- Coding (social sciences)
- Transcriptome
- Genetics
UN Sustainable Development Goals
- Reduced inequalities
No related works found for this paper.