Evaluating Evaluation Measure Stability

Buckley, Chris; Voorhees, Ellen M.

doi:10.1145/3130348.3130373

articleACM SIGIR ForumAug 2, 2017Closed access

Evaluating Evaluation Measure Stability

CBChris Buckley EMEllen M. Voorhees

National Institute of Standards and Technology

Indexed incrossref

Abstract

This paper presents a novel way of examining the accuracy of the evaluation measures commonly used in information retrieval experiments. It validates several of the rules-of-thumb experimenters use, such as the number of queries needed for a good experiment is at least 25 and 50 is better, while challenging other beliefs, such as the common evaluation measures are equally reliable. As an example, we show that Precision at 30 documents has about twice the average error rate as Average Precision has. These results can help information retrieval researchers design experiments that provide a desired level of confidence in their results. In particular, we suggest researchers using Web measures such as Precision at…

Citation impact

567

total citations

FWCI: 38.48
Percentile: 100%
References: 33

Citations per year

Authors

2

Topics & keywords

Topics

Keywords

Computer science
Measure (data warehouse)
Rule of thumb
Information retrieval
Stability (learning theory)
Data mining
Machine learning
Algorithm

UN Sustainable Development Goals

Quality Education

No related works found for this paper.