The Google Similarity Distance

Cilibrasi, Rudi; Vitányi, Paul

doi:10.1109/tkde.2007.48

articleIEEE Transactions on Knowledge and Data EngineeringFeb 9, 2007Closed access

The Google Similarity Distance

RCRudi Cilibrasi PVPaul Vitányi

Centrum Wiskunde & Informatica

Indexed incrossref

Abstract

Words and phrases acquire meaning from the way they are used in society, from their relative semantics to other words and phrases. For computers, the equivalent of "society" is "database," and the equivalent of "use" is "a way to search the database". We present a new theory of similarity between words and phrases based on information distance and Kolmogorov complexity. To fix thoughts, we use the World Wide Web (WWW) as the database, and Google as the search engine. The method is also applicable to other search engines and databases. This theory is then applied to construct a method to automatically extract similarity, the Google similarity distance, of words and phrases from the WWW using Google page counts.…

Citation impact

1,763

total citations

FWCI: 122.16
Percentile: 100%
References: 44

Citations per year

Authors

2

Topics & keywords

Topics

Keywords

WordNet
Computer science
Information retrieval
Semantics (computer science)
Similarity (geometry)
Semantic similarity
Context (archaeology)
Cluster analysis

UN Sustainable Development Goals

Quality Education

No related works found for this paper.