TruthfulQA: Measuring How Models Mimic Human Falsehoods

Lin, Stephanie; Hilton, Jacob; Evans, Owain

doi:10.18653/v1/2022.acl-long.229

articleProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)Jan 1, 2022HYBRID OA

TruthfulQA: Measuring How Models Mimic Human Falsehoods

SLStephanie Lin JHJacob Hilton OEOwain Evans

University of Oxford

Indexed incrossref

Abstract

We propose a benchmark to measure whether a language model is truthful in generating answers to questions. The benchmark comprises 817 questions that span 38 categories, including health, law, finance and politics. We crafted questions that some humans would answer falsely due to a false belief or misconception. To perform well, models must avoid generating false answers learned from imitating human texts. We tested GPT-3, GPT-Neo/J, GPT-2 and a T5-based model. The best model was truthful on 58% of questions, while human performance was 94%. Models generated many false answers that mimic popular misconceptions and have the potential to deceive humans. The largest models were generally the least truthful. This…

Citation impact

560

total citations

FWCI: 43.24
Percentile: 100%
References: 70

Citations per year

Authors

3

Topics & keywords

Topics

Keywords

Benchmark (surveying)
Computer science
Artificial intelligence
Imitation
Language model
Machine learning
Natural language processing
Psychology

UN Sustainable Development Goals

Peace, Justice and strong institutions

No related works found for this paper.