Statistical Comparisons of Classifiers over Multiple Data Sets

Demšar, Janez

articleDec 1, 2006Closed access

Statistical Comparisons of Classifiers over Multiple Data Sets

Abstract

While methods for comparing two learning algorithms on a single data set have been scrutinized for quite some time already, the issue of statistical tests for comparisons of more algorithms on multiple data sets, which is even more essential to typical machine learning studies, has been all but ignored. This article reviews the current practice and then theoretically and empirically examines several suitable tests. Based on that, we recommend a set of simple, yet safe and robust non-parametric tests for statistical comparisons of classifiers: the Wilcoxon signed ranks test for comparison of two classifiers and the Friedman test with the corresponding post-hoc tests for comparison of more classifiers over…

Citation impact

11,214

total citations

FWCI: 235.17
Percentile: 100%
References: 39

Citations per year

Authors

1

JD
Janez DemšarCorresponding

Topics & keywords

Topics

Keywords

Computer science
Statistical hypothesis testing
Wilcoxon signed-rank test
Machine learning
Artificial intelligence
Data set
Multiple comparisons problem
Set (abstract data type)

No related works found for this paper.