articleIEEE Transactions on Knowledge and Data EngineeringFeb 9, 2007Closed access

The Google Similarity Distance

Centrum Wiskunde & Informatica

Indexed incrossref

Abstract

Words and phrases acquire meaning from the way they are used in society, from their relative semantics to other words and phrases. For computers, the equivalent of "society" is "database," and the equivalent of "use" is "a way to search the database". We present a new theory of similarity between words and phrases based on information distance and Kolmogorov complexity. To fix thoughts, we use the World Wide Web (WWW) as the database, and Google as the search engine. The method is also applicable to other search engines and databases. This theory is then applied to construct a method to automatically extract similarity, the Google similarity distance, of words and phrases from the WWW using Google page counts.…

Citation impact

1,763
total citations
FWCI
122.16
Percentile
100%
References
44
Citations per year

Authors

2

Topics & keywords

Keywords
  • WordNet
  • Computer science
  • Information retrieval
  • Semantics (computer science)
  • Similarity (geometry)
  • Semantic similarity
  • Context (archaeology)
  • Cluster analysis
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.