TruthfulQA: Measuring How Models Mimic Human Falsehoods

University of Oxford

Indexed incrossref

Abstract

We propose a benchmark to measure whether a language model is truthful in generating answers to questions. The benchmark comprises 817 questions that span 38 categories, including health, law, finance and politics. We crafted questions that some humans would answer falsely due to a false belief or misconception. To perform well, models must avoid generating false answers learned from imitating human texts. We tested GPT-3, GPT-Neo/J, GPT-2 and a T5-based model. The best model was truthful on 58% of questions, while human performance was 94%. Models generated many false answers that mimic popular misconceptions and have the potential to deceive humans. The largest models were generally the least truthful. This…

Citation impact

560
total citations
FWCI
43.24
Percentile
100%
References
70
Citations per year

Authors

3

Topics & keywords

Keywords
  • Benchmark (surveying)
  • Computer science
  • Artificial intelligence
  • Imitation
  • Language model
  • Machine learning
  • Natural language processing
  • Psychology
UN Sustainable Development Goals
  • Peace, Justice and strong institutions
No related works found for this paper.