articleAug 22, 2014Closed access

A dirichlet multinomial mixture model-based approach for short text clustering

JYJianhua YinJWJianyong Wang

Tsinghua University

Indexed incrossref

Abstract

Short text clustering has become an increasingly important task with the popularity of social media like Twitter, Google+, and Facebook. It is a challenging problem due to its sparse, high-dimensional, and large-volume characteristics. In this paper, we proposed a collapsed Gibbs Sampling algorithm for the Dirichlet Multinomial Mixture model for short text clustering (abbr. to GSDMM). We found that GSDMM can infer the number of clusters automatically with a good balance between the completeness and homogeneity of the clustering results, and is fast to converge. GSDMM can also cope with the sparse and high-dimensional problem of short texts, and can obtain the representative words of each cluster. Our extensive…

Citation impact

553
total citations
FWCI
32.13
Percentile
100%
References
35
Citations per year

Authors

2

Topics & keywords

Keywords
  • Cluster analysis
  • Computer science
  • Latent Dirichlet allocation
  • Gibbs sampling
  • Multinomial distribution
  • Dirichlet distribution
  • Data mining
  • Artificial intelligence
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.