An analysis of four missing data treatment methods for supervised learning

Batista, Gustavo; Monard, Maria Carolina

doi:10.1080/713827181

articleApplied Artificial IntelligenceMay 1, 2003Closed access

An analysis of four missing data treatment methods for supervised learning

GBGustavo Batista MCMaria Carolina Monard

Universidade de São Paulo

Indexed incrossrefdoaj

Abstract

One relevant problem in data quality is missing data. Despite the frequent occurrence and the relevance of the missing data problem, many machine learning algorithms handle missing data in a rather naive way. However, missing data treatment should be carefully treated, otherwise bias might be introduced into the knowledge induced. In this work, we analyze the use of the k-nearest neighbor as an imputation method. Imputation is a term that denotes a procedure that replaces the missing values in a data set with some plausible values. One advantage of this approach is that the missing data treatment is independent of the learning algorithm used. This allows the user to select the most suitable imputation method…

Citation impact

885

total citations

FWCI: 16.86
Percentile: 100%
References: 10

Citations per year

Authors

2

Topics & keywords

Topics

Keywords

Missing data
Imputation (statistics)
Computer science
Data mining
k-nearest neighbors algorithm
Data set
Machine learning
Artificial intelligence

UN Sustainable Development Goals

Quality Education

No related works found for this paper.