Measuring and Narrowing the Compositionality Gap in Language Models

Press, Ofir; Zhang, Muru; Min, Sewon; Schmidt, Ludwig; Smith, Noah A.; Lewis, Mike

doi:10.18653/v1/2023.findings-emnlp.378

articleJan 1, 2023GOLD OA

Measuring and Narrowing the Compositionality Gap in Language Models

OPOfir Press MZMuru Zhang SMSewon Min LSLudwig Schmidt NANoah A. Smith

Mosaic · University of Washington · +1 more institution

Indexed incrossref

Abstract

We investigate the ability of language models to perform compositional reasoning tasks where the overall solution depends on correctly composing the answers to sub-problems. We measure how often models can correctly answer all sub-problems but not generate the overall solution, a ratio we call the compositionality gap. We evaluate this ratio by asking multi-hop questions with answers that require composing multiple facts unlikely to have been observed together during pretraining. In the GPT-3 family of models, as model size increases we show that the single-hop question answering performance improves faster than the multi-hop performance does, therefore the compositionality gap does not decrease. This…

Citation impact

216

total citations

FWCI: 35.98
Percentile: 100%
References: 62

Citations per year

Authors

6

Topics & keywords

Topics

Keywords

Principle of compositionality
Computer science
Ask price
Recall
Language model
Question answering
Natural language processing
Artificial intelligence

UN Sustainable Development Goals

Quality Education

No related works found for this paper.

Funding

NS
National Science Foundation