Comparison of Performance of Data Imputation Methods for Numeric Dataset
Symbiosis International University
Abstract
Missing data is common problem faced by researchers and data scientists. Therefore, it is required to handle them appropriately in order to get better and accurate results of data analysis. Objective of this research paper is to provide better understanding of data missingness mechanism, data imputation methods, and to assess performance of the widely used data imputation methods for numeric dataset. It will help practitioners and data scientists to select appropriate method of data imputation for numeric dataset while performing data mining task. In this paper, we comprehensively compare seven data imputation methods namely mean imputation, median imputation, kNN imputation, predictive mean matching, Bayesian…
Citation impact
- FWCI
- 21.46
- Percentile
- 100%
- References
- 46
Authors
3Topics & keywords
- Imputation (statistics)
- Missing data
- Computer science
- Data mining
- Regression
- Bayesian probability
- Mean squared error
- Linear regression