GeneMark-EP+: eukaryotic gene prediction with self-training in the space of genes and proteins
Georgia Institute of Technology · The Wallace H. Coulter Department of Biomedical Engineering
Abstract
Abstract We have made several steps toward creating a fast and accurate algorithm for gene prediction in eukaryotic genomes. First, we introduced an automated method for efficient ab initio gene finding, GeneMark-ES, with parameters trained in iterative unsupervised mode. Next, in GeneMark-ET we proposed a method of integration of unsupervised training with information on intron positions revealed by mapping short RNA reads. Now we describe GeneMark-EP, a tool that utilizes another source of external information, a protein database, readily available prior to the start of a sequencing project. A new specialized pipeline, ProtHint, initiates massive protein mapping to genome and extracts hints to splice sites…
Citation impact
- FWCI
- 23.17
- Percentile
- 100%
- References
- 30
Authors
3Topics & keywords
- Gene
- Genome
- Pipeline (software)
- Computational biology
- Gene prediction
- Computer science
- Translation (biology)
- splice