articleBioinformaticsMay 7, 2014HYBRID OA

SAMBLASTER : fast duplicate marking and structural variant read extraction

University of Virginia

Indexed inarxivcrossrefdoaj

Abstract

Motivation: Illumina DNA sequencing is now the predominant source of raw genomic data, and data volumes are growing rapidly. Bioinformatic analysis pipelines are having trouble keeping pace. A common bottleneck in such pipelines is the requirement to read, write, sort and compress large BAM files multiple times. Results: We present SAMBLASTER, a tool that reduces the number of times such costly operations are performed. SAMBLASTER is designed to mark duplicates in read-sorted SAM files as a piped post-pass on DNA aligner output before it is compressed to BAM. In addition, it can simultaneously output into separate files the discordant read-pairs and/or split-read mappings used for structural variant calling.…

Citation impact

1,018
total citations
FWCI
9.96
Percentile
100%
References
3
Citations per year

Authors

2

Topics & keywords

Keywords
  • Computer science
  • Bottleneck
  • Pipeline (software)
  • Source code
  • Pipeline transport
  • Parallel computing
  • Overhead (engineering)
  • Code (set theory)
No related works found for this paper.

Funding