Utilizing sequence intrinsic composition to classify protein-coding and long non-coding transcripts

Sun, Liang; Luo, Haitao; Bu, Dechao; Zhao, Guoguang; Yu, Kuntao; Zhang, Changhai; Liu, Yuanning; Chen, Runsheng; Zhao, Yi

doi:10.1093/nar/gkt646

articleNucleic Acids ResearchJul 27, 2013GOLD OA

Utilizing sequence intrinsic composition to classify protein-coding and long non-coding transcripts

LSLiang Sun HLHaitao Luo DBDechao Bu GZGuoguang Zhao KYKuntao Yu

Chinese Academy of Sciences · Institute of Computing Technology · +1 more institution

PubMed

Indexed incrossrefdoajpubmed

Abstract

It is a challenge to classify protein-coding or non-coding transcripts, especially those re-constructed from high-throughput sequencing data of poorly annotated species. This study developed and evaluated a powerful signature tool, Coding-Non-Coding Index (CNCI), by profiling adjoining nucleotide triplets to effectively distinguish protein-coding and non-coding sequences independent of known annotations. CNCI is effective for classifying incomplete transcripts and sense-antisense pairs. The implementation of CNCI offered highly accurate classification of transcripts assembled from whole-transcriptome sequencing data in a cross-species manner, that demonstrated gene evolutionary divergence between vertebrates,…

Citation impact

2,332

total citations

FWCI: 13.91
Percentile: 100%
References: 27

Citations per year

Authors

9

Topics & keywords

Topics

Keywords

Biology
Coding region
Computational biology
Transcriptome
Gene
Coding (social sciences)
Genetics
Gene expression

UN Sustainable Development Goals

Life in Land

No related works found for this paper.