Project Adam: building an efficient and scalable deep learning training system
Abstract
Large deep neural network models have recently demonstrated state-of-the-art accuracy on hard visual recognition tasks. Unfortunately such models are extremely time consuming to train and require large amount of compute cycles. We describe the design and implementation of a distributed system called Adam comprised of commodity server machines to train such models that exhibits world-class performance, scaling and task accuracy on visual recognition tasks. Adam achieves high efficiency and scalability through whole system co-design that optimizes and balances workload computation and communication. We exploit asynchrony throughout the system to improve performance and show that it additionally improves the…
Citation impact
596
total citations
- FWCI
- 41.66
- Percentile
- 100%
- References
- 28
Citations per year
Authors
4Topics & keywords
Topics
Keywords
- Computer science
- Scalability
- Exploit
- Artificial intelligence
- Benchmark (surveying)
- Deep learning
- Machine learning
- Workload
No related works found for this paper.