FastText.zip: Compressing text classification models
Indexed inarxivdatacite
Abstract
Online communities can be used to promote destructive behaviours, as in pro-Eating Disorder (ED) communities. Research needs annotated data to study these phenomena. Even though many platforms have already moderated this type of content, Twitter has not, and it can still be used for research purposes. In this paper, we unveiled emojis, words, and uncommon linguistic patterns within the ED Twitter community by using the Correlation Explanation (CorEx) algorithm on unstructured and non-annotated data to retrieve the topics. Then we annotated the dataset following these topics. We analysed then the use of CorEx and Word Mover’s Distance to retrieve automatically similar new sentences and augment the annotated…
Citation impact
907
total citations
- FWCI
- 28.05
- Percentile
- 100%
- References
- 27
Citations per year
Authors
6Topics & keywords
Topics
Keywords
- Quantization (signal processing)
- Computer science
- Margin (machine learning)
- Hash function
- Artificial intelligence
- Algorithm
- Pattern recognition (psychology)
- Natural language processing
No related works found for this paper.