On the Importance of Training Data Sample Selection in Random Forest Image Classification: A Case Study in Peatland Ecosystem Mapping

Millard, Koreen; Richardson, Murray

doi:10.3390/rs70708489

articleRemote SensingJul 6, 2015GOLD OA

On the Importance of Training Data Sample Selection in Random Forest Image Classification: A Case Study in Peatland Ecosystem Mapping

KMKoreen Millard MRMurray Richardson

Carleton University

Indexed incrossrefdoaj

Abstract

Random Forest (RF) is a widely used algorithm for classification of remotely sensed data. Through a case study in peatland classification using LiDAR derivatives, we present an analysis of the effects of input data characteristics on RF classifications (including RF out-of-bag error, independent classification accuracy and class proportion error). Training data selection and specific input variables (i.e., image channels) have a large impact on the overall accuracy of the image classification. High-dimension datasets should be reduced so that only uncorrelated important variables are used in classifications. Despite the fact that RF is an ensemble approach, independent error assessments should be used to…

Citation impact

592

total citations

FWCI: 19.83
Percentile: 100%
References: 40

Citations per year

Authors

2

Topics & keywords

Topics

Keywords

Random forest
Computer science
Class (philosophy)
Contextual image classification
Set (abstract data type)
Data set
Spatial analysis
Artificial intelligence

UN Sustainable Development Goals

Life in Land

No related works found for this paper.

Funding

NS
Natural Sciences and Engineering Research Council of Canada