DeepSpeed
Microsoft (United States) · Bellevue Hospital Center
Abstract
Explore new techniques in Microsoft's open source library called DeepSpeed, which advances large model training by improving scale, speed, cost, and usability, unlocking the ability to train 100-billion-parameter models. DeepSpeed is compatible with PyTorch. One piece of our library, called ZeRO, is a new parallelized optimizer that greatly reduces the resources needed for model and data parallelism while massively increasing the number of parameters that can be trained. Researchers have used these breakthroughs to create Turing Natural Language Generation (Turing-NLG), which at the time of its release was the largest publicly known language model at 17 billion parameters. In addition we will also go over our…
Citation impact
- FWCI
- 28.15
- Percentile
- 100%
- References
- 2
Authors
4Topics & keywords
- Computer science
- Turing
- Usability
- Massively parallel
- Open source
- Artificial intelligence
- Turing machine
- Programming language