articleNucleic Acids ResearchJul 1, 2007GOLD OA

CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine

Peking University

PubMed
Indexed incrossrefdoajpubmed

Abstract

Recent transcriptome studies have revealed that a large number of transcripts in mammals and other organisms do not encode proteins but function as noncoding RNAs (ncRNAs) instead. As millions of transcripts are generated by large-scale cDNA and EST sequencing projects every year, there is a need for automatic methods to distinguish protein-coding RNAs from noncoding RNAs accurately and quickly. We developed a support vector machine-based classifier, named Coding Potential Calculator (CPC), to assess the protein-coding potential of a transcript based on six biologically meaningful sequence features. Tenfold cross-validation on the training dataset and further testing on several large datasets showed that CPC…

Citation impact

2,999
total citations
FWCI
2.69
Percentile
100%
References
32
Citations per year

Authors

7

Topics & keywords

Keywords
  • Biology
  • ENCODE
  • Computational biology
  • Support vector machine
  • Coding region
  • Coding (social sciences)
  • Transcriptome
  • Genetics
UN Sustainable Development Goals
  • Reduced inequalities
No related works found for this paper.

Funding