Cluster Quality Analysis Using Silhouette Score
University of Maryland, Baltimore County
Abstract
Clustering is an important phase in data mining. Selecting the number of clusters in a clustering algorithm, e.g. choosing the best value of k in the various k-means algorithms [1], can be difficult. We studied the use of silhouette scores and scatter plots to suggest, and then validate, the number of clusters we specified in running the k-means clustering algorithm on two publicly available data sets. Scikit-learn's [4] silhouette score method, which is a measure of the quality of a cluster, was used to find the mean silhouette co-efficient of all the samples for different number of clusters. The highest silhouette score indicates the optimal number of clusters. We present several instances of utilizing the…
Citation impact
- FWCI
- 30.87
- Percentile
- 100%
- References
- 8
Authors
2Topics & keywords
- Silhouette
- Cluster analysis
- Computer science
- Pattern recognition (psychology)
- Cluster (spacecraft)
- Artificial intelligence
- Measure (data warehouse)
- k-means clustering