Utilizing sequence intrinsic composition to classify protein-coding and long non-coding transcripts
Chinese Academy of Sciences · Institute of Computing Technology · +1 more institution
Abstract
It is a challenge to classify protein-coding or non-coding transcripts, especially those re-constructed from high-throughput sequencing data of poorly annotated species. This study developed and evaluated a powerful signature tool, Coding-Non-Coding Index (CNCI), by profiling adjoining nucleotide triplets to effectively distinguish protein-coding and non-coding sequences independent of known annotations. CNCI is effective for classifying incomplete transcripts and sense-antisense pairs. The implementation of CNCI offered highly accurate classification of transcripts assembled from whole-transcriptome sequencing data in a cross-species manner, that demonstrated gene evolutionary divergence between vertebrates,…
Citation impact
- FWCI
- 13.91
- Percentile
- 100%
- References
- 27
Authors
9- LSLiang SunCorresponding
Chinese Academy of Sciences, Institute of Computing Technology, Institute of Biophysics
- HLHaitao Luo
Chinese Academy of Sciences, Institute of Computing Technology, Institute of Biophysics
- DBDechao Bu
Chinese Academy of Sciences, Institute of Computing Technology, Institute of Biophysics
- GZGuoguang Zhao
Chinese Academy of Sciences, Institute of Computing Technology, Institute of Biophysics
- KYKuntao Yu
Chinese Academy of Sciences, Institute of Computing Technology, Institute of Biophysics
Topics & keywords
- Biology
- Coding region
- Computational biology
- Transcriptome
- Gene
- Coding (social sciences)
- Genetics
- Gene expression
- Life in Land