preprintarXiv (Cornell University)Dec 12, 2016GREEN OA

FastText.zip: Compressing text classification models

Indexed inarxivdatacite

Abstract

Online communities can be used to promote destructive behaviours, as in pro-Eating Disorder (ED) communities. Research needs annotated data to study these phenomena. Even though many platforms have already moderated this type of content, Twitter has not, and it can still be used for research purposes. In this paper, we unveiled emojis, words, and uncommon linguistic patterns within the ED Twitter community by using the Correlation Explanation (CorEx) algorithm on unstructured and non-annotated data to retrieve the topics. Then we annotated the dataset following these topics. We analysed then the use of CorEx and Word Mover’s Distance to retrieve automatically similar new sentences and augment the annotated…

Citation impact

907
total citations
FWCI
28.05
Percentile
100%
References
27
Citations per year

Authors

6

Topics & keywords

Keywords
  • Quantization (signal processing)
  • Computer science
  • Margin (machine learning)
  • Hash function
  • Artificial intelligence
  • Algorithm
  • Pattern recognition (psychology)
  • Natural language processing
No related works found for this paper.