articleApplied Artificial IntelligenceMay 1, 2003Closed access

An analysis of four missing data treatment methods for supervised learning

Universidade de São Paulo

Indexed incrossrefdoaj

Abstract

One relevant problem in data quality is missing data. Despite the frequent occurrence and the relevance of the missing data problem, many machine learning algorithms handle missing data in a rather naive way. However, missing data treatment should be carefully treated, otherwise bias might be introduced into the knowledge induced. In this work, we analyze the use of the k-nearest neighbor as an imputation method. Imputation is a term that denotes a procedure that replaces the missing values in a data set with some plausible values. One advantage of this approach is that the missing data treatment is independent of the learning algorithm used. This allows the user to select the most suitable imputation method…

Citation impact

885
total citations
FWCI
16.86
Percentile
100%
References
10
Citations per year

Authors

2

Topics & keywords

Keywords
  • Missing data
  • Imputation (statistics)
  • Computer science
  • Data mining
  • k-nearest neighbors algorithm
  • Data set
  • Machine learning
  • Artificial intelligence
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.