articleBMC BioinformaticsDec 1, 2010GOLD OA

BIGSdb: Scalable analysis of bacterial genome variation at the population level

University of Oxford

PubMed
Indexed incrossrefdoajpubmed

Abstract

Background

The opportunities for bacterial population genomics that are being realised by the application of parallel nucleotide sequencing require novel bioinformatics platforms. These must be capable of the storage, retrieval, and analysis of linked phenotypic and genotypic information in an accessible, scalable and computationally efficient manner.

Results

The Bacterial Isolate Genome Sequence Database (BIGSDB) is a scalable, open source, web-accessible database system that meets these needs, enabling phenotype and sequence data, which can range from a single sequence read to whole genome data, to be efficiently linked for a limitless number of bacterial specimens. The system builds on the widely used mlstdbNet software, developed for the storage and distribution of multilocus sequence typing (MLST) data, and incorporates the capacity to define and identify any number of loci and genetic variants at those loci within the stored nucleotide sequences. These loci can be further organised into 'schemes' for isolate characterisation or for evolutionary or functional analyses. Isolates and loci can be indexed by multiple names and any number of alternative schemes can be accommodated, enabling cross-referencing of different studies and approaches. LIMS functionality of the software enables linkage to and organisation of laboratory samples. The data are easily linked to external databases and fine-grained authentication of access permits multiple users to participate in community annotation by setting up or contributing to different schemes within the database. Some of the applications of BIGSDB are illustrated with the genera Neisseria and Streptococcus.The BIGSDB source code and documentation are available at http://pubmlst.org/software/database/bigsdb/.

Citation impact

2,426
total citations
FWCI
33.14
Percentile
100%
References
58
Citations per year

Authors

2

Topics & keywords

Keywords
  • Multilocus sequence typing
  • Genomics
  • Annotation
  • Scalability
  • Genome
  • Computer science
  • Software
  • Computational biology
No related works found for this paper.

Funding