articleMicrobiomeJul 6, 2017GOLD OA

VirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data

University of Southern California · Fudan University

PubMed
Indexed incrossrefdoajpubmed

Abstract

Background

Identifying viral sequences in mixed metagenomes containing both viral and host contigs is a critical first step in analyzing the viral component of samples. Current tools for distinguishing prokaryotic virus and host contigs primarily use gene-based similarity approaches. Such approaches can significantly limit results especially for short contigs that have few predicted proteins or lack proteins with similarity to previously known viruses.

Methods

We have developed VirFinder, the first k-mer frequency based, machine learning method for virus contig identification that entirely avoids gene-based similarity searches. VirFinder instead identifies viral sequences based on our empirical observation that viruses and hosts have discernibly different k-mer signatures. VirFinder's performance in correctly identifying viral sequences was tested by training its machine learning model on sequences from host and viral genomes sequenced before 1 January 2014 and evaluating on sequences obtained after 1 January 2014.

No related works found for this paper.

Funding