Clustering on the Unit Hypersphere using von Mises-Fisher Distributions
The University of Texas at Austin
Abstract
Several large scale data mining applications, such as text categorization and gene expression analysis, involve high-dimensional data that is also inherently directional in nature. Often such data is L2 normalized so that it lies on the surface of a unit hypersphere. Popular models such as (mixtures of) multi-variate Gaussians are inadequate for characterizing such data. This paper proposes a generative mixture-model approach to clustering directional data based on the von Mises-Fisher (vMF) distribution, which arises naturally for data distributed on the unit hypersphere. In particular, we derive and analyze two variants of the Expectation Maximization (EM) framework for estimating the mean and concentration…
Citation impact
- FWCI
- 21.03
- Percentile
- 100%
- References
- 57
Authors
4Topics & keywords
- Hypersphere
- Cluster analysis
- Mixture model
- Mathematics
- Pattern recognition (psychology)
- Cosine similarity
- Computer science
- Expectation–maximization algorithm