Abstract

NLP tasks are often limited by scarcity ofmanually annotated data. In social mediasentiment analysis and related tasks,researchers have therefore used binarizedemoticons and specific hashtags as formsof distant supervision. Our paper showsthat by extending the distant supervisionto a more diverse set of noisy labels, themodels can learn richer representations.Through emoji prediction on a dataset of1246 million tweets containing one of 64common emojis we obtain state-of-theartperformance on 8 benchmark datasetswithin emotion, sentiment and sarcasm detectionusing a single pretrained model.Our analyses confirm that the diversity ofour emotional labels yield a performanceimprovement over previous distant…

No related works found for this paper.