Information-theoretic co-clustering

Dhillon, Inderjit S.; Mallela, Subramanyam; Modha, Dharmendra S.

doi:10.1145/956750.956764

articleAug 24, 2003Closed access

Information-theoretic co-clustering

ISInderjit S. Dhillon SMSubramanyam Mallela DSDharmendra S. Modha

The University of Texas at Austin · IBM Research - Almaden

Indexed incrossref

Abstract

Two-dimensional contingency or co-occurrence tables arise frequently in important applications such as text, web-log and market-basket data analysis. A basic problem in contingency table analysis is co-clustering: simultaneous clustering of the rows and columns. A novel theoretical formulation views the contingency table as an empirical joint probability distribution of two discrete random variables and poses the co-clustering problem as an optimization problem in information theory---the optimal co-clustering maximizes the mutual information between the clustered random variables subject to constraints on the number of row and column clusters. We present an innovative co-clustering algorithm that…

Citation impact

1,010

total citations

FWCI: 25.84
Percentile: 100%
References: 29

Citations per year

Authors

3

Topics & keywords

Topics

Keywords

Cluster analysis
Contingency table
Row and column spaces
Computer science
Biclustering
Mutual information
Data mining
Row

No related works found for this paper.

Funding

NA
National Aeronautics and Space Administration