Nearest neighbor imputation algorithms: a critical evaluation

Beretta, Lorenzo; Santaniello, Alessandro

doi:10.1186/s12911-016-0318-z

articleBMC Medical Informatics and Decision MakingJul 1, 2016GOLD OA

Nearest neighbor imputation algorithms: a critical evaluation

LBLorenzo Beretta ASAlessandro Santaniello

Fondazione IRCCS Ca' Granda Ospedale Maggiore Policlinico

PubMed

Indexed incrossrefdoajpubmed

Abstract

Background

Nearest neighbor (NN) imputation algorithms are efficient methods to fill in missing data where each missing value on some records is replaced by a value obtained from related cases in the whole set of records. Besides the capability to substitute the missing data with plausible values that are as close as possible to the true value, imputation algorithms should preserve the original data structure and avoid to distort the distribution of the imputed variable. Despite the efficiency of NN algorithms little is known about the effect of these methods on data structure.

Methods

Simulation on synthetic datasets with different patterns and degrees of missingness were conducted to evaluate the performance of NN with one single neighbor (1NN) and with k neighbors without (kNN) or with weighting (wkNN) in the context of different learning frameworks: plain set, reduced set after ReliefF filtering, bagging, random choice of attributes, bagging combined with random choice of attributes (Random-Forest-like method).

Citation impact

686

total citations

FWCI: 28.63
Percentile: 100%
References: 25

Citations per year

Authors

2

Topics & keywords

Topics

Keywords

Imputation (statistics)
Missing data
Computer science
Weighting
Random forest
Data mining
k-nearest neighbors algorithm
Resampling

No related works found for this paper.