Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training
Georgia Institute of Technology · The Wallace H. Coulter Department of Biomedical Engineering
Abstract
We describe a new ab initio algorithm, GeneMark-ES version 2, that identifies protein-coding genes in fungal genomes. The algorithm does not require a predetermined training set to estimate parameters of the underlying hidden Markov model (HMM). Instead, the anonymous genomic sequence in question is used as an input for iterative unsupervised training. The algorithm extends our previously developed method tested on genomes of Arabidopsis thaliana, Caenorhabditis elegans, and Drosophila melanogaster. To better reflect features of fungal gene organization, we enhanced the intron submodel to accommodate sequences with and without branch point sites. This design enables the algorithm to work equally well for…
Citation impact
- FWCI
- —
- Percentile
- —
- References
- 45
Authors
4- VTVardges Ter-HovhannisyanCorresponding
Georgia Institute of Technology
- ALAlexandre Lomsadze
Georgia Institute of Technology, The Wallace H. Coulter Department of Biomedical Engineering
- YOYury O. Chernoff
Georgia Institute of Technology
- MBMark Borodovsky
Georgia Institute of Technology, The Wallace H. Coulter Department of Biomedical Engineering
Topics & keywords
- Biology
- Genome
- Gene prediction
- Computational biology
- Gene
- Genome project
- Caenorhabditis elegans
- Hidden Markov model