articlePLoS Computational BiologyJul 17, 2014GOLD OA

Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features

Johns Hopkins University · Institute for Research in Fundamental Sciences · +1 more institution

PubMed
Indexed incrossrefdoajpubmed

Abstract

Oligomers of length k, or k-mers, are convenient and widely used features for modeling the properties and functions of DNA and protein sequences. However, k-mers suffer from the inherent limitation that if the parameter k is increased to resolve longer features, the probability of observing any specific k-mer becomes very small, and k-mer counts approach a binary variable, with most k-mers absent and a few present once. Thus, any statistical learning approach using k-mers as features becomes susceptible to noisy training set k-mer frequencies once k becomes large. To address this problem, we introduce alternative feature sets using gapped k-mers, a new classifier, gkm-SVM, and a general method for robust…

Citation impact

571
total citations
FWCI
13.66
Percentile
100%
References
39
Citations per year

Authors

4

Topics & keywords

Keywords
  • Support vector machine
  • Computer science
  • ENCODE
  • k-mer
  • Pattern recognition (psychology)
  • Bayes' theorem
  • Artificial intelligence
  • Classifier (UML)
No related works found for this paper.

Funding