articleMay 8, 2007Closed access

Scaling up all pairs similarity search

Google (United States) · University of California, Irvine

Indexed incrossref

Abstract

Given a large collection of sparse vector data in a high dimensional space, we investigate the problem of finding all pairs of vectors whose similarity score (as determined by a function such as cosine distance) is above a given threshold. We propose a simple algorithm based on novel indexing and optimization strategies that solves this problem without relying on approximation methods or extensive parameter tuning. We show the approach efficiently handles a variety of datasets across a wide setting of similarity thresholds, with large speedups over previous state-of-the-art approaches.

Citation impact

677
total citations
FWCI
30.93
Percentile
100%
References
25
Citations per year

Authors

3

Topics & keywords

Keywords
  • Scaling
  • Similarity (geometry)
  • Computer science
  • Nearest neighbor search
  • Artificial intelligence
  • Pattern recognition (psychology)
  • Mathematics
No related works found for this paper.