PyEvolve: a toolkit for statistical modelling of molecular evolution

Butterfield, Andrew; Vedagiri, Vivek; Lang, Edward; Lawrence, Cath; Wakefield, Matthew J.; Isaev, Alexander; Huttley, Gavin

doi:10.1186/1471-2105-5-1

articleBMC BioinformaticsJan 5, 2004GOLD OA

PyEvolve: a toolkit for statistical modelling of molecular evolution

ABAndrew Butterfield VVVivek Vedagiri ELEdward Lang CLCath Lawrence MJMatthew J. Wakefield

Australian National University · Singapore Science Park

PubMed

Indexed incrossrefdoajpubmed

Abstract

Background

Examining the distribution of variation has proven an extremely profitable technique in the effort to identify sequences of biological significance. Most approaches in the field, however, evaluate only the conserved portions of sequences - ignoring the biological significance of sequence differences. A suite of sophisticated likelihood based statistical models from the field of molecular evolution provides the basis for extracting the information from the full distribution of sequence variation. The number of different problems to which phylogeny-based maximum likelihood calculations can be applied is extensive. Available software packages that can perform likelihood calculations suffer from a lack of flexibility and scalability, or employ error-prone approaches to model parameterisation.

Results

Here we describe the implementation of PyEvolve, a toolkit for the application of existing, and development of new, statistical methods for molecular evolution. We present the object architecture and design schema of PyEvolve, which includes an adaptable multi-level parallelisation schema. The approach for defining new methods is illustrated by implementing a novel dinucleotide model of substitution that includes a parameter for mutation of methylated CpG's, which required 8 lines of standard Python code to define. Benchmarking was performed using either a dinucleotide or codon substitution model applied to an alignment of BRCA1 sequences from 20 mammals, or a 10 species subset. Up to five-fold parallel performance gains over serial were recorded. Compared to leading alternative software, PyEvolve exhibited significantly better real world performance for parameter rich models with a large data set, reducing the time required for optimisation from approximately 10 days to approximately 6 hours.

Citation impact

735

total citations

FWCI: 5.18
Percentile: 100%
References: 20

Citations per year

Authors

7

Topics & keywords

Topics

Keywords

Computer science
Python (programming language)
Software
Scalability
Benchmarking
Data mining
Statistical model
Source code

No related works found for this paper.