Comparison of Random Forest and Parametric Imputation Models for Imputing Missing Data Using MICE: A CALIBER Study
Indexed incrossrefpubmed
Abstract
Multivariate imputation by chained equations (MICE) is commonly used for imputing missing data in epidemiologic research. The "true" imputation model may contain nonlinearities which are not included in default imputation models. Random forest imputation is a machine learning technique which can accommodate nonlinearities and interactions and does not require a particular regression model to be specified. We compared parametric MICE with a random forest-based MICE algorithm in 2 simulation studies. The first study used 1,000 random samples of 2,000 persons drawn from the 10,128 stable angina patients in the CALIBER database (Cardiovascular Disease Research using Linked Bespoke Studies and Electronic Records;…
Citation impact
796
total citations
- FWCI
- 12.36
- Percentile
- 100%
- References
- 40
Citations per year
Authors
5Topics & keywords
Topics
Keywords
- Imputation (statistics)
- Caliber
- Missing data
- Random forest
- Statistics
- Parametric statistics
- Computer science
- Econometrics
No related works found for this paper.
Funding
- NINational Institute for Social Care and Health ResearchAward: MR/K006584/1
- LSLondon School of Hygiene and Tropical Medicine
- WTWellcome TrustAwards: MR/K006584/1, 086091, 086091/Z/08/Z
- CRCancer Research UK
- NINational Institute for Health and Care ResearchAward: RP-PG-0407-10314
- BHBritish Heart Foundation
- UCUniversity College London
- MRMedical Research CouncilAwards: MR/K006584/1, MR/K006584/1, MC_EX_G0800814, K006584/1, MR/K02180X/1, G0902393
- EAEngineering and Physical Sciences Research Council
- EAEconomic and Social Research CouncilAwards: ES/G026300/1, ES/H022252/1