articleAug 24, 2003Closed access

Information-theoretic co-clustering

The University of Texas at Austin · IBM Research - Almaden

Indexed incrossref

Abstract

Two-dimensional contingency or co-occurrence tables arise frequently in important applications such as text, web-log and market-basket data analysis. A basic problem in contingency table analysis is co-clustering: simultaneous clustering of the rows and columns. A novel theoretical formulation views the contingency table as an empirical joint probability distribution of two discrete random variables and poses the co-clustering problem as an optimization problem in information theory---the optimal co-clustering maximizes the mutual information between the clustered random variables subject to constraints on the number of row and column clusters. We present an innovative co-clustering algorithm that…

Citation impact

1,010
total citations
FWCI
25.84
Percentile
100%
References
29
Citations per year

Authors

3

Topics & keywords

Keywords
  • Cluster analysis
  • Contingency table
  • Row and column spaces
  • Computer science
  • Biclustering
  • Mutual information
  • Data mining
  • Row
No related works found for this paper.

Funding