Automated Hate Speech Detection and the Problem of Offensive Language

Davidson, Thomas; Warmsley, Dana; Macy, Michael W.; Weber, Ingmar

doi:10.1609/icwsm.v11i1.14955

articleProceedings of the International AAAI Conference on Web and Social MediaMay 3, 2017DIAMOND OA

Automated Hate Speech Detection and the Problem of Offensive Language

TDThomas Davidson DWDana Warmsley MWMichael W. Macy IWIngmar Weber

Cornell University · Hamad bin Khalifa University

Indexed incrossref

Abstract

A key challenge for automatic hate-speech detection on social media is the separation of hate speech from other instances of offensive language. Lexical detection methods tend to have low precision because they classify all messages containing particular terms as hate speech and previous work using supervised learning has failed to distinguish between the two categories. We used a crowd-sourced hate speech lexicon to collect tweets containing hate speech keywords. We use crowd-sourcing to label a sample of these tweets into three categories: those containing hate speech, only offensive language, and those with neither. We train a multi-class classifier to distinguish between these different categories. Close…

Citation impact

2,442

total citations

FWCI: 131.52
Percentile: 100%
References: 24

Citations per year

Authors

4

Topics & keywords

Topics

Keywords

Offensive
Lexicon
Computer science
Classifier (UML)
Voice activity detection
Artificial intelligence
Speech recognition
Natural language processing

No related works found for this paper.