Corpus-based and knowledge-based measures of text semantic similarity

Mihalcea, Rada; Corley, Courtney D.; Strapparava, Carlo

articleUniversity of North Texas Digital Library (University of North Texas)Jul 16, 2006GREEN OA

Corpus-based and knowledge-based measures of text semantic similarity

RMRada Mihalcea CDCourtney D. Corley CSCarlo Strapparava

University of North Texas · Istituto Centrale per la Ricerca Scientifica e Tecnologica Applicata al Mare

Abstract

This paper presents a method for measuring the semantic similarity of texts, using corpus-based and knowledge-based measures of similarity. Previous work on this problem has focused mainly on either large documents (e.g. text classification, information retrieval) or individual words (e.g. synonymy tests). Given that a large fraction of the information available today, on the Web and elsewhere, consists of short text snippets (e.g. abstracts of scientific documents, imagine captions, product descriptions), in this paper we focus on measuring the semantic similarity of short texts. Through experiments performed on a paraphrase data set, we show that the semantic similarity method outperforms methods based on…

Citation impact

1,189

total citations

FWCI: 29.34
Percentile: 100%
References: 31

Citations per year

Authors

3

Topics & keywords

Topics

Keywords

Computer science
Semantic similarity
Paraphrase
Similarity (geometry)
Information retrieval
Natural language processing
Artificial intelligence
Set (abstract data type)

UN Sustainable Development Goals

Quality Education

No related works found for this paper.