Ab initio gene identification in metagenomic sequences
Georgia Institute of Technology · The Wallace H. Coulter Department of Biomedical Engineering
Abstract
We describe an algorithm for gene identification in DNA sequences derived from shotgun sequencing of microbial communities. Accurate ab initio gene prediction in a short nucleotide sequence of anonymous origin is hampered by uncertainty in model parameters. While several machine learning approaches could be proposed to bypass this difficulty, one effective method is to estimate parameters from dependencies, formed in evolution, between frequencies of oligonucleotides in protein-coding regions and genome nucleotide composition. Original version of the method was proposed in 1999 and has been used since for (i) reconstructing codon frequency vector needed for gene finding in viral genomes and (ii) initializing…
Citation impact
- FWCI
- 11.43
- Percentile
- 100%
- References
- 57
Authors
3Topics & keywords
- Biology
- Genome
- Computational biology
- Gene
- Metagenomics
- Gene prediction
- Genetics
- Bacterial genome size