Semantics derived automatically from language corpora contain human-like biases

Caliskan, Aylin; Bryson, Joanna J.; Narayanan, Arvind

doi:10.1126/science.aal4230

articleScienceApr 13, 2017GREEN OA

Semantics derived automatically from language corpora contain human-like biases

ACAylin Caliskan JJJoanna J. Bryson ANArvind Narayanan

Princeton University · Center for Information Technology · +1 more institution

PubMed

Indexed inarxivcrossrefpubmed

Abstract

Machine learning is a means to derive artificial intelligence by discovering patterns in existing data. Here, we show that applying machine learning to ordinary human language results in human-like semantic biases. We replicated a spectrum of known biases, as measured by the Implicit Association Test, using a widely used, purely statistical machine-learning model trained on a standard corpus of text from the World Wide Web. Our results indicate that text corpora contain recoverable and accurate imprints of our historic biases, whether morally neutral as toward insects or flowers, problematic as toward race or gender, or even simply veridical, reflecting the status quo distribution of gender with respect to…

Citation impact

2,767

total citations

FWCI: 101.32
Percentile: 100%
References: 57

Citations per year

Authors

3

Topics & keywords

Topics

Keywords

Word embedding
Computer science
Artificial intelligence
Test (biology)
Natural language processing
Replicate
Word (group theory)
Semantics (computer science)

UN Sustainable Development Goals

Quality Education

No related works found for this paper.