Random forest missing data algorithms

Tang, Fei; Ishwaran, Hemant

doi:10.1002/sam.11348

articleStatistical Analysis and Data Mining The ASA Data Science JournalJun 13, 2017GREEN OA

Random forest missing data algorithms

FTFei Tang HIHemant Ishwaran

University of Miami

PubMed

Indexed incrossrefpubmed

Abstract

Random forest (RF) missing data algorithms are an attractive approach for imputing missing data. They have the desirable properties of being able to handle mixed types of missing data, they are adaptive to interactions and nonlinearity, and they have the potential to scale to big data settings. Currently there are many different RF imputation algorithms, but relatively little guidance about their efficacy. Using a large, diverse collection of data sets, imputation performance of various RF algorithms was assessed under different missing data mechanisms. Algorithms included proximity imputation, on the fly imputation, and imputation utilizing multivariate unsupervised and supervised splitting-the latter class…

Citation impact

763

total citations

FWCI: 64.80
Percentile: 100%
References: 25

Citations per year

Authors

2

Topics & keywords

Topics

Keywords

Random forest
Computer science
Missing data
Algorithm
Artificial intelligence
Machine learning

UN Sustainable Development Goals

Life in Land

No related works found for this paper.

Funding

NI
National Institutes of Health
Award: R01CA163739