The statistics of word cooccurrences : word pairs and collocations

Indexed indatacite

Abstract

"You shall know a word by the company it keeps!" With this slogan, J. R. Firth drew attention to a fact that language scholars had intuitively known for a long time: In natural language, words are not combined randomly into phrases and sentences, constrained only by the rules of syntax. They have a tendency to appear in certain recurrent combinations. As there are many possible reasons for words to go together, a broad range of linguistic and extra-linguistic phenomena can be found among the recurrent combinations, making them a goldmine of information for linguistics, natural language processing and related fields. There are compound nouns ("black box"), fixed and opaque idioms ("kick the bucket"), lexical…

Citation impact

649
total citations
FWCI
Percentile
References
134
Citations per year

Authors

1

Topics & keywords

Keywords
  • Word (group theory)
  • Natural language processing
  • Linguistics
  • Artificial intelligence
  • Computer science
  • Statistics
  • Mathematics
  • Philosophy
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.