articleMay 1, 2003Closed access

Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data

University of Minnesota

Indexed incrossref

Abstract

Finding clusters in data, especially high dimensional data, is challenging when the clusters are of widely differing shapes, sizes, and densities, and when the data contains noise and outliers. We present a novel clustering technique that addresses these issues. Our algorithm first finds the nearest neighbors of each data point and then redefines the similarity between pairs of points in terms of how many nearest neighbors the two points share. Using this definition of similarity, our algorithm identifies core points and then builds clusters around the core points. The use of a shared nearest neighbor definition of similarity alleviates problems with varying densities and high dimensionality, while the use of…

Citation impact

706
total citations
FWCI
25.31
Percentile
100%
References
34
Citations per year

Authors

3

Topics & keywords

Keywords
  • DBSCAN
  • Cluster analysis
  • Computer science
  • Curse of dimensionality
  • Similarity (geometry)
  • Data point
  • Outlier
  • Data mining
No related works found for this paper.

Funding