Trained Ternary Quantization
Stanford Health Care · Stanford Medicine · +2 more institutions
Abstract
Deep neural networks are widely used in machine learning applications. However, the deployment of large neural networks models can be difficult to deploy on mobile devices with limited power budgets. To solve this problem, we propose Trained Ternary Quantization (TTQ), a method that can reduce the precision of weights in neural networks to ternary values. This method has very little accuracy degradation and can even improve the accuracy of some models (32, 44, 56-layer ResNet) on CIFAR-10 and AlexNet on ImageNet. And our AlexNet model is trained from scratch, which means it's as easy as to train normal full precision model. We highlight our trained quantization method that can learn both ternary values and…
Citation impact
- FWCI
- —
- Percentile
- —
- References
- 15
Authors
4- CZChenzhuo ZhuCorresponding
Stanford Health Care, Stanford Medicine, Stanford University, Tsinghua University
- SHSong Han
Stanford Health Care, Stanford Medicine, Stanford University, Tsinghua University
- HMHuizi Mao
Stanford Health Care, Stanford Medicine, Stanford University, Tsinghua University
- WJWilliam J. Dally
Stanford Health Care, Stanford Medicine, Stanford University, Tsinghua University
Topics & keywords
- Ternary operation
- Computer science
- Quantization (signal processing)
- Artificial neural network
- Inference
- Algorithm
- Binary number
- Artificial intelligence