articleIEEE Transactions on Knowledge and Data EngineeringJun 21, 2011Closed access

A Survey of Indexing Techniques for Scalable Record Linkage and Deduplication

Australian National University

Indexed incrossref

Abstract

Record linkage is the process of matching records from several databases that refer to the same entities. When applied on a single database, this process is known as deduplication. Increasingly, matched data are becoming important in many application areas, because they can contain information that is not available otherwise, or that is too costly to acquire. Removing duplicate records in a single database is a crucial step in the data cleaning process, because duplicates can severely influence the outcomes of any subsequent data processing or data mining. With the increasing size of today's databases, the complexity of the matching process becomes one of the major challenges for record linkage and…

Citation impact

665
total citations
FWCI
50.77
Percentile
100%
References
75
Citations per year

Authors

1

Topics & keywords

Keywords
  • Data deduplication
  • Computer science
  • Search engine indexing
  • Record linkage
  • Scalability
  • Matching (statistics)
  • Database
  • Data mining
No related works found for this paper.