Machine learning algorithm validation with a limited sample size
Indexed incrossrefdoajpubmed
Abstract
Advances in neuroimaging, genomic, motion tracking, eye-tracking and many other technology-based data collection methods have led to a torrent of high dimensional datasets, which commonly have a small number of samples because of the intrinsic high cost of data collection involving human participants. High dimensional data with a small number of samples is of critical importance for identifying biomarkers and conducting feasibility and pilot work, however it can lead to biased machine learning (ML) performance estimates. Our review of studies which have applied ML to predict autistic from non-autistic individuals showed that small sample size is associated with higher reported classification accuracy. Thus, we…
Citation impact
1,653
total citations
- FWCI
- 44.26
- Percentile
- 100%
- References
- 33
Citations per year
Authors
4Topics & keywords
Topics
Keywords
- Overfitting
- Sample size determination
- Computer science
- Artificial intelligence
- Cross-validation
- Data collection
- Machine learning
- Selection bias
UN Sustainable Development Goals
- Reduced inequalities
No related works found for this paper.