Automatic Evaluation of Topic Coherence
University of California, Irvine · Data61 · +1 more institution
Abstract
This paper introduces the novel task of topic coherence evaluation, whereby a set of words, as generated by a topic model, is rated for coherence or interpretability. We apply a range of topic scoring models to the evaluation task, drawing on WordNet, Wikipedia and the Google search engine, and existing research on lexical similarity/relatedness. In comparison with human scores for a set of learned topics over two distinct datasets, we show a simple co-occurrence measure based on pointwise mutual information over Wikipedia data is able to achieve results for the task at or nearing the level of inter-annotator correlation, and that other Wikipedia-based lexical relatedness methods also achieve strong results.…
Citation impact
- FWCI
- 29.38
- Percentile
- 100%
- References
- 36
Authors
4Topics & keywords
- WordNet
- Computer science
- Interpretability
- Coherence (philosophical gambling strategy)
- Natural language processing
- Task (project management)
- Set (abstract data type)
- Information retrieval
- Quality Education