articlePLoS ONENov 7, 2019GOLD OA

Machine learning algorithm validation with a limited sample size

University of Manchester

PubMed
Indexed incrossrefdoajpubmed

Abstract

Advances in neuroimaging, genomic, motion tracking, eye-tracking and many other technology-based data collection methods have led to a torrent of high dimensional datasets, which commonly have a small number of samples because of the intrinsic high cost of data collection involving human participants. High dimensional data with a small number of samples is of critical importance for identifying biomarkers and conducting feasibility and pilot work, however it can lead to biased machine learning (ML) performance estimates. Our review of studies which have applied ML to predict autistic from non-autistic individuals showed that small sample size is associated with higher reported classification accuracy. Thus, we…

Citation impact

1,653
total citations
FWCI
44.26
Percentile
100%
References
33
Citations per year

Authors

4

Topics & keywords

Keywords
  • Overfitting
  • Sample size determination
  • Computer science
  • Artificial intelligence
  • Cross-validation
  • Data collection
  • Machine learning
  • Selection bias
UN Sustainable Development Goals
  • Reduced inequalities
No related works found for this paper.

Funding