reviewEnvironmental Science & TechnologyJun 29, 2023Closed access

Machine Learning in Environmental Research: Common Pitfalls and Best Practices

Princeton University

PubMed
Indexed incrossrefpubmed

Abstract

Machine learning (ML) is increasingly used in environmental research to process large data sets and decipher complex relationships between system variables. However, due to the lack of familiarity and methodological rigor, inadequate ML studies may lead to spurious conclusions. In this study, we synthesized literature analysis with our own experience and provided a tutorial-like compilation of common pitfalls along with best practice guidelines for environmental ML research. We identified more than 30 key items and provided evidence-based data analysis based on 148 highly cited research articles to exhibit the misconceptions of terminologies, proper sample size and feature size, data enrichment and feature…

Citation impact

541
total citations
FWCI
62.45
Percentile
100%
References
106
Citations per year

Authors

3

Topics & keywords

Keywords
  • Computer science
  • Spurious relationship
  • Feature selection
  • Machine learning
  • Data pre-processing
  • Preprocessor
  • Data science
  • Randomness
No related works found for this paper.

Funding