Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data

Ertöz, Levent; Steinbach, Michael; Kumar, Vipin

doi:10.1137/1.9781611972733.5

articleMay 1, 2003Closed access

Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data

LELevent Ertöz MSMichael Steinbach VKVipin Kumar

University of Minnesota

Indexed incrossref

Abstract

Finding clusters in data, especially high dimensional data, is challenging when the clusters are of widely differing shapes, sizes, and densities, and when the data contains noise and outliers. We present a novel clustering technique that addresses these issues. Our algorithm first finds the nearest neighbors of each data point and then redefines the similarity between pairs of points in terms of how many nearest neighbors the two points share. Using this definition of similarity, our algorithm identifies core points and then builds clusters around the core points. The use of a shared nearest neighbor definition of similarity alleviates problems with varying densities and high dimensionality, while the use of…

Citation impact

706

total citations

FWCI: 25.31
Percentile: 100%
References: 34

Citations per year

Authors

3

Topics & keywords

Topics

Keywords

DBSCAN
Cluster analysis
Computer science
Curse of dimensionality
Similarity (geometry)
Data point
Outlier
Data mining

No related works found for this paper.

Funding

DA
Deutscher Akademischer Austauschdienst