articleMPG.PuRe (Max Planck Society)Dec 1, 2005GREEN OA

Clustering on the Unit Hypersphere using von Mises-Fisher Distributions

The University of Texas at Austin

Abstract

Several large scale data mining applications, such as text categorization and gene expression analysis, involve high-dimensional data that is also inherently directional in nature. Often such data is L2 normalized so that it lies on the surface of a unit hypersphere. Popular models such as (mixtures of) multi-variate Gaussians are inadequate for characterizing such data. This paper proposes a generative mixture-model approach to clustering directional data based on the von Mises-Fisher (vMF) distribution, which arises naturally for data distributed on the unit hypersphere. In particular, we derive and analyze two variants of the Expectation Maximization (EM) framework for estimating the mean and concentration…

Citation impact

807
total citations
FWCI
21.03
Percentile
100%
References
57
Citations per year

Authors

4

Topics & keywords

Keywords
  • Hypersphere
  • Cluster analysis
  • Mixture model
  • Mathematics
  • Pattern recognition (psychology)
  • Cosine similarity
  • Computer science
  • Expectation–maximization algorithm
No related works found for this paper.