articleNature CommunicationsFeb 28, 2024GOLD OA

Data leakage inflates prediction performance in connectome-based machine learning models

Yale University · Northeastern University

PubMed
Indexed incrossrefdoajpubmed

Abstract

Predictive modeling is a central technique in neuroimaging to identify brain-behavior relationships and test their generalizability to unseen data. However, data leakage undermines the validity of predictive models by breaching the separation between training and test data. Leakage is always an incorrect practice but still pervasive in machine learning. Understanding its effects on neuroimaging predictive models can inform how leakage affects existing literature. Here, we investigate the effects of five forms of leakage-involving feature selection, covariate correction, and dependence between subjects-on functional and structural connectome-based machine learning models across four datasets and three…

Citation impact

138
total citations
FWCI
43.24
Percentile
100%
References
78
Citations per year

Authors

5

Topics & keywords

Keywords
  • Generalizability theory
  • Connectome
  • Leakage (economics)
  • Computer science
  • Machine learning
  • Covariate
  • Artificial intelligence
  • Neuroimaging
No related works found for this paper.

Funding