Natural Questions: A Benchmark for Question Answering Research

Kwiatkowski, Tom; Palomaki, Jennimaria; Redfield, Olivia; Collins, Michael; Parikh, Ankur P.; Alberti, Chris; Epstein, Danielle; Polosukhin, Illia; Devlin, Jacob; Lee, Kenton; Toutanova, Kristina; Jones, Llion; Kelcey, Matthew; Chang, Ming‐Wei; Dai, Andrew M.; Uszkoreit, Jakob; Le, Quoc V.; Petrov, Slav

doi:10.1162/tacl_a_00276

articleTransactions of the Association for Computational LinguisticsAug 2, 2019DIAMOND OA

Natural Questions: A Benchmark for Question Answering Research

TKTom Kwiatkowski JPJennimaria Palomaki OROlivia Redfield MCMichael Collins APAnkur P. Parikh

Google (United States)

Indexed incrossrefdoaj

Abstract

We present the Natural Questions corpus, a question answering data set. Questions consist of real anonymized, aggregated queries issued to the Google search engine. An annotator is presented with a question along with a Wikipedia page from the top 5 search results, and annotates a long answer (typically a paragraph) and a short answer (one or more entities) if present on the page, or marks null if no long/short answer is present. The public release consists of 307,373 training examples with single annotations; 7,830 examples with 5-way annotations for development data; and a further 7,842 examples with 5-way annotated sequestered as test data. We present experiments validating quality of the data. We also…

Citation impact

1,978

total citations

FWCI: 140.15
Percentile: 100%
References: 34

Citations per year

Authors

18

Topics & keywords

Topics

Keywords

Computer science
Question answering
Paragraph
Annotation
Benchmark (surveying)
Information retrieval
Task (project management)
Set (abstract data type)

No related works found for this paper.