HAQ: Hardware-Aware Automated Quantization With Mixed Precision

Wang, Kuan; Liu, Zhijian; Lin, Yujun; Lin, Ji; Han, Song

doi:10.1109/cvpr.2019.00881

preprintJun 1, 2019Closed access

HAQ: Hardware-Aware Automated Quantization With Mixed Precision

KWKuan Wang ZLZhijian Liu YLYujun Lin JLJi Lin SHSong Han

Moscow Institute of Thermal Technology · Massachusetts Institute of Technology

Indexed incrossref

Abstract

Model quantization is a widely used technique to compress and accelerate deep neural network (DNN) inference. Emergent DNN hardware accelerators begin to support mixed precision (1-8 bits) to further improve the computation efficiency, which raises a great challenge to find the optimal bitwidth for each layer: it requires domain experts to explore the vast design space trading off among accuracy, latency, energy, and model size, which is both time-consuming and sub-optimal. There are plenty of specialized hardware for neural networks, but little research has been done for specialized neural network optimization for a particular hardware architecture. Conventional quantization algorithm ignores the different…

Citation impact

927

total citations

FWCI: 51.34
Percentile: 100%
References: 57

Citations per year

Authors

5

Topics & keywords

Topics

Keywords

Computer science
Quantization (signal processing)
Hardware acceleration
Edge device
Computation
Artificial neural network
Computer hardware
Design space exploration

UN Sustainable Development Goals

Affordable and clean energy

No related works found for this paper.