Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science
University of Pennsylvania · University of Chicago · +1 more institution
Abstract
As the field of data science continues to grow, there will be an ever-increasing demand for tools that make machine learning accessible to non-experts. In this paper, we introduce the concept of tree-based pipeline optimization for automating one of the most tedious parts of machine learning--pipeline design. We implement an open source Tree-based Pipeline Optimization Tool (TPOT) in Python and demonstrate its effectiveness on a series of simulated and real-world benchmark data sets. In particular, we show that TPOT can design machine learning pipelines that provide a significant improvement over a basic machine learning analysis while requiring little to no input nor prior knowledge from the user. We also…
Citation impact
- FWCI
- 33.16
- Percentile
- 100%
- References
- 29
Authors
4Topics & keywords
- Computer science
- Pipeline transport
- Pipeline (software)
- Python (programming language)
- Machine learning
- Tree (set theory)
- Artificial intelligence
- Benchmark (surveying)