articleNov 6, 2007GOLD OA

A comparison of statistical significance tests for information retrieval evaluation

University of Massachusetts Amherst

Indexed incrossref

Abstract

Information retrieval (IR) researchers commonly use three tests of statistical significance: the Student's paired t-test, the Wilcoxon signed rank test, and the sign test. Other researchers have previously proposed using both the bootstrap and Fisher's randomization (permutation) test as non-parametric significance tests for IR but these tests have seen little use. For each of these five tests, we took the ad-hoc retrieval runs submitted to TRECs 3 and 5-8, and for each pair of runs, we measured the statistical significance of the difference in their mean average precision. We discovered that there is little practical difference between the randomization, bootstrap, and t tests. Both the Wilcoxon and sign test…

Citation impact

716
total citations
FWCI
26.15
Percentile
100%
References
31
Citations per year

Authors

3

Topics & keywords

Keywords
  • Wilcoxon signed-rank test
  • Sign test
  • Statistical significance
  • Statistical hypothesis testing
  • Resampling
  • Statistics
  • Mathematics
  • Test (biology)
No related works found for this paper.

Funding