Centering, scaling, and transformations: improving the biological information content of metabolomics data

Berg, Robert A. van den; Hoefsloot, Huub C. J.; Westerhuis, Johan A.; Smilde, Age K.; Werf, Mariët J. van der

doi:10.1186/1471-2164-7-142

articleBMC GenomicsJun 8, 2006GOLD OA

Centering, scaling, and transformations: improving the biological information content of metabolomics data

RARobert A. van den Berg HCHuub C. J. Hoefsloot JAJohan A. Westerhuis AKAge K. Smilde MJMariët J. van der Werf

Pension Fund for Care and Well-Being · University of Amsterdam

PubMed

Indexed incrossrefdoajpubmed

Abstract

Background

Extracting relevant biological information from large data sets is a major challenge in functional genomics research. Different aspects of the data hamper their biological interpretation. For instance, 5000-fold differences in concentration for different metabolites are present in a metabolomics data set, while these differences are not proportional to the biological relevance of these metabolites. However, data analysis methods are not able to make this distinction. Data pretreatment methods can correct for aspects that hinder the biological interpretation of metabolomics data sets by emphasizing the biological information in the data set and thus improving their biological interpretability.

Results

Different data pretreatment methods, i.e. centering, autoscaling, pareto scaling, range scaling, vast scaling, log transformation, and power transformation, were tested on a real-life metabolomics data set. They were found to greatly affect the outcome of the data analysis and thus the rank of the, from a biological point of view, most important metabolites. Furthermore, the stability of the rank, the influence of technical errors on data analysis, and the preference of data analysis methods for selecting highly abundant metabolites were affected by the data pretreatment method used prior to data analysis.

Citation impact

2,422

total citations

FWCI: 12.81
Percentile: 100%
References: 32

Citations per year

Authors

5

Topics & keywords

Topics

Keywords

Interpretability
Biological data
Data set
Metabolomics
Computer science
Data mining
Set (abstract data type)
Relevance (law)

No related works found for this paper.