articleOct 1, 2020GREEN OA

Cluster Quality Analysis Using Silhouette Score

University of Maryland, Baltimore County

Indexed incrossrefdatacite

Abstract

Clustering is an important phase in data mining. Selecting the number of clusters in a clustering algorithm, e.g. choosing the best value of k in the various k-means algorithms [1], can be difficult. We studied the use of silhouette scores and scatter plots to suggest, and then validate, the number of clusters we specified in running the k-means clustering algorithm on two publicly available data sets. Scikit-learn's [4] silhouette score method, which is a measure of the quality of a cluster, was used to find the mean silhouette co-efficient of all the samples for different number of clusters. The highest silhouette score indicates the optimal number of clusters. We present several instances of utilizing the…

Citation impact

799
total citations
FWCI
30.87
Percentile
100%
References
8
Citations per year

Authors

2

Topics & keywords

Keywords
  • Silhouette
  • Cluster analysis
  • Computer science
  • Pattern recognition (psychology)
  • Cluster (spacecraft)
  • Artificial intelligence
  • Measure (data warehouse)
  • k-means clustering
No related works found for this paper.