Information-theoretic co-clustering
The University of Texas at Austin · IBM Research - Almaden
Abstract
Two-dimensional contingency or co-occurrence tables arise frequently in important applications such as text, web-log and market-basket data analysis. A basic problem in contingency table analysis is co-clustering: simultaneous clustering of the rows and columns. A novel theoretical formulation views the contingency table as an empirical joint probability distribution of two discrete random variables and poses the co-clustering problem as an optimization problem in information theory---the optimal co-clustering maximizes the mutual information between the clustered random variables subject to constraints on the number of row and column clusters. We present an innovative co-clustering algorithm that…
Citation impact
- FWCI
- 25.84
- Percentile
- 100%
- References
- 29
Authors
3Topics & keywords
- Cluster analysis
- Contingency table
- Row and column spaces
- Computer science
- Biclustering
- Mutual information
- Data mining
- Row