Semi-supervised Clustering by Seeding

Basu, Sugato; Banerjee, Arindam; Mooney, Raymond J.

articleJul 8, 2002Closed access

Semi-supervised Clustering by Seeding

SBSugato Basu ABArindam Banerjee RJRaymond J. Mooney

Abstract

Semi-supervised clustering uses a small amount of labeled data to aid and bias the clustering of unlabeled data. This paper explores the use of labeled data to generate initial seed clusters, as well as the use of constraints generated from labeled data to guide the clustering process. It introduces two semi-supervised variants of KMeans clustering that can be viewed as instances of the EM algorithm, where labeled data provides prior information about the conditional distributions of hidden category labels. Experimental results demonstrate the advantages of these methods over standard random seeding and COP-KMeans, a previously developed semi-supervised clustering algorithm.

Citation impact

806

total citations

FWCI: 12.10
Percentile: 100%
References: 0

Citations per year

Authors

3

Topics & keywords

Topics

Keywords

Seeding
Cluster analysis
Computer science
Artificial intelligence
Machine learning
Engineering

No related works found for this paper.