Explicitly unbiased large language models still form biased associations

Bai, Xuechunzi; Wang, Angelina; Sucholutsky, Ilia; Griffiths, Thomas L.

doi:10.1073/pnas.2416228122

articleProceedings of the National Academy of SciencesFeb 20, 2025HYBRID OA

Explicitly unbiased large language models still form biased associations

XBXuechunzi Bai AWAngelina Wang ISIlia Sucholutsky TLThomas L. Griffiths

University of Chicago · Stanford University · +2 more institutions

PubMed

Indexed incrossrefpubmed

Abstract

Large language models (LLMs) can pass explicit social bias tests but still harbor implicit biases, similar to humans who endorse egalitarian beliefs yet exhibit subtle biases. Measuring such implicit biases can be a challenge: As LLMs become increasingly proprietary, it may not be possible to access their embeddings and apply existing bias measures; furthermore, implicit biases are primarily a concern if they affect the actual decisions that these systems make. We address both challenges by introducing two measures: LLM Word Association Test, a prompt-based method for revealing implicit bias; and LLM Relative Decision Test, a strategy to detect subtle discrimination in contextual decisions. Both measures are…

Citation impact

66

total citations

FWCI: 123.87
Percentile: 100%
References: 77

Citations per year

Authors

4

Topics & keywords

Topics

Keywords

Computer science
Econometrics
Mathematics
Linguistics
Statistical physics
Philosophy
Physics

No related works found for this paper.