Identification of protein coding regions in RNA transcripts
Georgia Institute of Technology · The Wallace H. Coulter Department of Biomedical Engineering · +1 more institution
Abstract
Massive parallel sequencing of RNA transcripts by next-generation technology (RNA-Seq) generates critically important data for eukaryotic gene discovery. Gene finding in transcripts can be done by statistical (alignment-free) as well as by alignment-based methods. We describe a new tool, GeneMarkS-T, for ab initio identification of protein-coding regions in RNA transcripts. The algorithm parameters are estimated by unsupervised training which makes unnecessary manually curated preparation of training sets. We demonstrate that (i) the unsupervised training is robust with respect to the presence of transcripts assembly errors and (ii) the accuracy of GeneMarkS-T in identifying protein-coding regions and,…
Citation impact
- FWCI
- 4.22
- Percentile
- 100%
- References
- 29
Authors
3- STShiyuyun Tang
Georgia Institute of Technology
- ALAlexandre Lomsadze
The Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology
- MBMark BorodovskyCorresponding
Moscow Institute of Physics and Technology, Georgia Institute of Technology, The Wallace H. Coulter Department of Biomedical Engineering
Topics & keywords
- Biology
- Computational biology
- RNA
- Gene
- Coding (social sciences)
- Non-coding RNA
- Translation (biology)
- Coding region