Abstract

NLP tasks are often limited by scarcity ofmanually annotated data. In social mediasentiment analysis and related tasks,researchers have therefore used binarizedemoticons and specific hashtags as formsof distant supervision. Our paper showsthat by extending the distant supervisionto a more diverse set of noisy labels, themodels can learn richer representations.Through emoji prediction on a dataset of1246 million tweets containing one of 64common emojis we obtain state-of-theartperformance on 8 benchmark datasetswithin emotion, sentiment and sarcasm detectionusing a single pretrained model.Our analyses confirm that the diversity ofour emotional labels yield a performanceimprovement over previous distant…

Citation impact

732

total citations

FWCI: 75.79
Percentile: 100%
References: 39

Citations per year

Authors

5

Topics & keywords

Topics

Keywords

Emoji
Sarcasm
Sentiment analysis
Computer science
Artificial intelligence
Benchmark (surveying)
Natural language processing
Social media

No related works found for this paper.