Measuring and Narrowing the Compositionality Gap in Language Models
Mosaic · University of Washington · +1 more institution
Abstract
We investigate the ability of language models to perform compositional reasoning tasks where the overall solution depends on correctly composing the answers to sub-problems. We measure how often models can correctly answer all sub-problems but not generate the overall solution, a ratio we call the compositionality gap. We evaluate this ratio by asking multi-hop questions with answers that require composing multiple facts unlikely to have been observed together during pretraining. In the GPT-3 family of models, as model size increases we show that the single-hop question answering performance improves faster than the multi-hop performance does, therefore the compositionality gap does not decrease. This…
Citation impact
- FWCI
- 35.98
- Percentile
- 100%
- References
- 62
Authors
6Topics & keywords
- Principle of compositionality
- Computer science
- Ask price
- Recall
- Language model
- Question answering
- Natural language processing
- Artificial intelligence
- Quality Education