Optimizing Semantic Coherence in Topic Models

Mimno, David; Wallach, Hanna; Talley, Edmund M.; Leenders, Miriam; McCallum, Andrew

articleScholarWorks@UMassAmherst (University of Massachusetts Amherst)Jul 27, 2011GREEN OA

Optimizing Semantic Coherence in Topic Models

DMDavid Mimno HWHanna Wallach EMEdmund M. Talley MLMiriam Leenders AMAndrew McCallum

Princeton University · University of Massachusetts Amherst · +1 more institution

Abstract

Large organizations often face the critical challenge of sharing information and maintaining connections between disparate subunits. Tools for automated analysis of document collections, such as topic models, can provide an important means for communication. The value of topic modeling is in its ability to discover interpretable, coherent themes from unstructured document sets, yet it is not unusual to find semantic mismatches that substantially reduce user confidence. In this paper, we first present an expert-driven topic annotation study, undertaken in order to obtain an annotated set of baseline topics and their distinguishing characteristics. We then present a metric for detecting poor-quality topics that…

Citation impact

1,247

total citations

FWCI: 41.32
Percentile: 100%
References: 17

Citations per year

Authors

5

Topics & keywords

Topics

Keywords

Latent Dirichlet allocation
Computer science
Topic model
Metric (unit)
Latent semantic analysis
Information retrieval
Linear subspace
Latent variable

No related works found for this paper.