articleMay 8, 2007Closed access
Scaling up all pairs similarity search
Google (United States) · University of California, Irvine
Indexed incrossref
Abstract
Given a large collection of sparse vector data in a high dimensional space, we investigate the problem of finding all pairs of vectors whose similarity score (as determined by a function such as cosine distance) is above a given threshold. We propose a simple algorithm based on novel indexing and optimization strategies that solves this problem without relying on approximation methods or extensive parameter tuning. We show the approach efficiently handles a variety of datasets across a wide setting of similarity thresholds, with large speedups over previous state-of-the-art approaches.
Citation impact
677
total citations
- FWCI
- 30.93
- Percentile
- 100%
- References
- 25
Citations per year
Authors
3Topics & keywords
Topics
Keywords
- Scaling
- Similarity (geometry)
- Computer science
- Nearest neighbor search
- Artificial intelligence
- Pattern recognition (psychology)
- Mathematics
No related works found for this paper.