articleBioinformaticsJun 2, 2019HYBRID OA

Scaling tree-based automated machine learning to biomedical big data with a feature set selector

University of Pennsylvania

PubMed
Indexed incrossrefdoajpubmed

Abstract

MOTIVATION: Automated machine learning (AutoML) systems are helpful data science assistants designed to scan data for novel features, select appropriate supervised learning models and optimize their parameters. For this purpose, Tree-based Pipeline Optimization Tool (TPOT) was developed using strongly typed genetic programing (GP) to recommend an optimized analysis pipeline for the data scientist's prediction problem. However, like other AutoML systems, TPOT may reach computational resource limits when working on big data such as whole-genome expression data. RESULTS: We introduce two new features implemented in TPOT that helps increase the system's scalability: Feature Set Selector (FSS) and Template. FSS…

Citation impact

463
total citations
FWCI
29.02
Percentile
100%
References
40
Citations per year

Authors

3

Topics & keywords

Keywords
  • Pipeline (software)
  • Computer science
  • Scalability
  • Feature (linguistics)
  • Set (abstract data type)
  • Big data
  • Tree (set theory)
  • Data set
No related works found for this paper.

Funding