articleACM SIGIR ForumAug 2, 2017Closed access

Evaluating Evaluation Measure Stability

National Institute of Standards and Technology

Indexed incrossref

Abstract

This paper presents a novel way of examining the accuracy of the evaluation measures commonly used in information retrieval experiments. It validates several of the rules-of-thumb experimenters use, such as the number of queries needed for a good experiment is at least 25 and 50 is better, while challenging other beliefs, such as the common evaluation measures are equally reliable. As an example, we show that Precision at 30 documents has about twice the average error rate as Average Precision has. These results can help information retrieval researchers design experiments that provide a desired level of confidence in their results. In particular, we suggest researchers using Web measures such as Precision at…

Citation impact

567
total citations
FWCI
38.48
Percentile
100%
References
33
Citations per year

Authors

2

Topics & keywords

Keywords
  • Computer science
  • Measure (data warehouse)
  • Rule of thumb
  • Information retrieval
  • Stability (learning theory)
  • Data mining
  • Machine learning
  • Algorithm
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.