Is Cosine-Similarity of Embeddings Really About Similarity?
Netflix (United States) · Cornell University
Abstract
Cosine-similarity is the cosine of the angle between two vectors, or equivalently the dot product between their normalizations. A popular application is to quantify semantic similarity between high-dimensional objects by applying cosine-similarity to a learned low-dimensional feature embedding. This can work better but sometimes also worse than the unnormalized dot-product between embedded vectors in practice. To gain insight into this empirical observation, we study embeddings derived from regularized linear models, where closed-form solutions facilitate analytical insights. We derive analytically how cosine-similarity can yield arbitrary and therefore meaningless 'similarities.' For some linear models the…
Citation impact
- FWCI
- 36.50
- Percentile
- 100%
- References
- 3
Authors
3Topics & keywords
- Similarity (geometry)
- Cosine similarity
- Computer science
- Trigonometric functions
- Artificial intelligence
- Mathematics
- Pattern recognition (psychology)