articleOperating Systems Design and ImplementationOct 6, 2014Closed access

Project Adam: building an efficient and scalable deep learning training system

Microsoft (United States)

Abstract

Large deep neural network models have recently demonstrated state-of-the-art accuracy on hard visual recognition tasks. Unfortunately such models are extremely time consuming to train and require large amount of compute cycles. We describe the design and implementation of a distributed system called Adam comprised of commodity server machines to train such models that exhibits world-class performance, scaling and task accuracy on visual recognition tasks. Adam achieves high efficiency and scalability through whole system co-design that optimizes and balances workload computation and communication. We exploit asynchrony throughout the system to improve performance and show that it additionally improves the…

Citation impact

596
total citations
FWCI
41.66
Percentile
100%
References
28
Citations per year

Authors

4

Topics & keywords

Keywords
  • Computer science
  • Scalability
  • Exploit
  • Artificial intelligence
  • Benchmark (surveying)
  • Deep learning
  • Machine learning
  • Workload
No related works found for this paper.