Explicitly unbiased large language models still form biased associations
University of Chicago · Stanford University · +2 more institutions
Abstract
Large language models (LLMs) can pass explicit social bias tests but still harbor implicit biases, similar to humans who endorse egalitarian beliefs yet exhibit subtle biases. Measuring such implicit biases can be a challenge: As LLMs become increasingly proprietary, it may not be possible to access their embeddings and apply existing bias measures; furthermore, implicit biases are primarily a concern if they affect the actual decisions that these systems make. We address both challenges by introducing two measures: LLM Word Association Test, a prompt-based method for revealing implicit bias; and LLM Relative Decision Test, a strategy to detect subtle discrimination in contextual decisions. Both measures are…
Citation impact
- FWCI
- 123.87
- Percentile
- 100%
- References
- 77
Authors
4Topics & keywords
- Computer science
- Econometrics
- Mathematics
- Linguistics
- Statistical physics
- Philosophy
- Physics