Scaling tree-based automated machine learning to biomedical big data with a feature set selector
Indexed incrossrefdoajpubmed
Abstract
MOTIVATION: Automated machine learning (AutoML) systems are helpful data science assistants designed to scan data for novel features, select appropriate supervised learning models and optimize their parameters. For this purpose, Tree-based Pipeline Optimization Tool (TPOT) was developed using strongly typed genetic programing (GP) to recommend an optimized analysis pipeline for the data scientist's prediction problem. However, like other AutoML systems, TPOT may reach computational resource limits when working on big data such as whole-genome expression data. RESULTS: We introduce two new features implemented in TPOT that helps increase the system's scalability: Feature Set Selector (FSS) and Template. FSS…
Citation impact
463
total citations
- FWCI
- 29.02
- Percentile
- 100%
- References
- 40
Citations per year
Authors
3Topics & keywords
Topics
Keywords
- Pipeline (software)
- Computer science
- Scalability
- Feature (linguistics)
- Set (abstract data type)
- Big data
- Tree (set theory)
- Data set
No related works found for this paper.