HAWQ: Hessian AWare Quantization of Neural Networks With Mixed-Precision
University of California, Berkeley
Abstract
Model size and inference speed/power have become a major challenge in the deployment of neural networks for many applications. A promising approach to address these problems is quantization. However, uniformly quantizing a model to ultra-low precision leads to significant accuracy degradation. A novel solution for this is to use mixed-precision quantization, as some parts of the network may allow lower precision as compared to other layers. However, there is no systematic way to determine the precision of different layers. A brute force approach is not feasible for deep networks, as the search space for mixed-precision is exponential in the number of layers. Another challenge is a similar factorial complexity…
Citation impact
- FWCI
- 20.11
- Percentile
- 100%
- References
- 89
Authors
5Topics & keywords
- Quantization (signal processing)
- Computer science
- Hessian matrix
- Algorithm
- Artificial neural network
- Inference
- Artificial intelligence
- Mathematics