Natural Questions: A Benchmark for Question Answering Research

Google (United States)

Indexed incrossrefdoaj

Abstract

We present the Natural Questions corpus, a question answering data set. Questions consist of real anonymized, aggregated queries issued to the Google search engine. An annotator is presented with a question along with a Wikipedia page from the top 5 search results, and annotates a long answer (typically a paragraph) and a short answer (one or more entities) if present on the page, or marks null if no long/short answer is present. The public release consists of 307,373 training examples with single annotations; 7,830 examples with 5-way annotations for development data; and a further 7,842 examples with 5-way annotated sequestered as test data. We present experiments validating quality of the data. We also…

Citation impact

1,978
total citations
FWCI
140.15
Percentile
100%
References
34
Citations per year

Authors

18

Topics & keywords

Keywords
  • Computer science
  • Question answering
  • Paragraph
  • Annotation
  • Benchmark (surveying)
  • Information retrieval
  • Task (project management)
  • Set (abstract data type)
No related works found for this paper.