Natural Questions: A Benchmark for Question Answering Research
Indexed incrossrefdoaj
Abstract
We present the Natural Questions corpus, a question answering data set. Questions consist of real anonymized, aggregated queries issued to the Google search engine. An annotator is presented with a question along with a Wikipedia page from the top 5 search results, and annotates a long answer (typically a paragraph) and a short answer (one or more entities) if present on the page, or marks null if no long/short answer is present. The public release consists of 307,373 training examples with single annotations; 7,830 examples with 5-way annotations for development data; and a further 7,842 examples with 5-way annotated sequestered as test data. We present experiments validating quality of the data. We also…
Citation impact
1,978
total citations
- FWCI
- 140.15
- Percentile
- 100%
- References
- 34
Citations per year
Authors
18Topics & keywords
Topics
Keywords
- Computer science
- Question answering
- Paragraph
- Annotation
- Benchmark (surveying)
- Information retrieval
- Task (project management)
- Set (abstract data type)
No related works found for this paper.