Searching for a Robust Neural Architecture in Four GPU Hours
Baidu (China) · University of Technology Sydney
Abstract
Conventional neural architecture search (NAS) approaches are usually based on reinforcement learning or evolutionary strategy, which take more than 1000 GPU hours to find a good model on CIFAR-10. We propose an efficient NAS approach, which learns the searching approach by gradient descent. Our approach represents the search space as a directed acyclic graph (DAG). This DAG contains thousands of sub-graphs, each of which indicates a kind of neural architecture. To avoid traversing all the possibilities of the sub-graphs, we develop a differentiable sampler over the DAG. This sampler is learnable and optimized by the validation loss after training the sampled architecture. In this way, our approach can be…
Citation impact
- FWCI
- 45.93
- Percentile
- 100%
- References
- 95
Authors
2Topics & keywords
- Computer science
- Directed acyclic graph
- Traverse
- Differentiable function
- Stochastic gradient descent
- Architecture
- Gradient descent
- Artificial neural network