Biases in Large Language Models: Origins, Inventory, and Discussion

Navigli, Roberto; Conia, Simone; Roß, Björn

doi:10.1145/3597307

articleJournal of Data and Information QualityMay 16, 2023HYBRID OA

Biases in Large Language Models: Origins, Inventory, and Discussion

RNRoberto Navigli SCSimone Conia BRBjörn Roß

Sapienza University of Rome · University of Edinburgh

Indexed incrossref

Abstract

In this article, we introduce and discuss the pervasive issue of bias in the large language models that are currently at the core of mainstream approaches to Natural Language Processing (NLP). We first introduce data selection bias, that is, the bias caused by the choice of texts that make up a training corpus. Then, we survey the different types of social bias evidenced in the text generated by language models trained on such corpora, ranging from gender to age, from sexual orientation to ethnicity, and from religion to culture. We conclude with directions focused on measuring, reducing, and tackling the aforementioned types of bias.

Citation impact

320

total citations

FWCI: 53.02
Percentile: 100%
References: 121

Citations per year

Authors

3

Topics & keywords

Topics

Keywords

Computer science
Mainstream
Natural language processing
Gender bias
Selection (genetic algorithm)
Artificial intelligence
Selection bias
Data science

No related works found for this paper.

Funding

EC
European Commission
Award: 726487