Optimizing Semantic Coherence in Topic Models
Princeton University · University of Massachusetts Amherst · +1 more institution
Abstract
Large organizations often face the critical challenge of sharing information and maintaining connections between disparate subunits. Tools for automated analysis of document collections, such as topic models, can provide an important means for communication. The value of topic modeling is in its ability to discover interpretable, coherent themes from unstructured document sets, yet it is not unusual to find semantic mismatches that substantially reduce user confidence. In this paper, we first present an expert-driven topic annotation study, undertaken in order to obtain an annotated set of baseline topics and their distinguishing characteristics. We then present a metric for detecting poor-quality topics that…
Citation impact
- FWCI
- 41.32
- Percentile
- 100%
- References
- 17
Authors
5Topics & keywords
- Latent Dirichlet allocation
- Computer science
- Topic model
- Metric (unit)
- Latent semantic analysis
- Information retrieval
- Linear subspace
- Latent variable