Optimizing Semantic Coherence in Topic Models

Princeton University · University of Massachusetts Amherst · +1 more institution

Abstract

Large organizations often face the critical challenge of sharing information and maintaining connections between disparate subunits. Tools for automated analysis of document collections, such as topic models, can provide an important means for communication. The value of topic modeling is in its ability to discover interpretable, coherent themes from unstructured document sets, yet it is not unusual to find semantic mismatches that substantially reduce user confidence. In this paper, we first present an expert-driven topic annotation study, undertaken in order to obtain an annotated set of baseline topics and their distinguishing characteristics. We then present a metric for detecting poor-quality topics that…

Citation impact

1,247
total citations
FWCI
41.32
Percentile
100%
References
17
Citations per year

Authors

5

Topics & keywords

Keywords
  • Latent Dirichlet allocation
  • Computer science
  • Topic model
  • Metric (unit)
  • Latent semantic analysis
  • Information retrieval
  • Linear subspace
  • Latent variable
No related works found for this paper.

Funding