A Survey of Quantization Methods for Efficient Neural Network Inference

Gholami, Amir; Kim, Sehoon; Dong, Zhen; Yao, Zhewei; Mahoney, Michael W.; Keutzer, Kurt

doi:10.1201/9781003162810-13

book chapterJan 12, 2022GOLD OA

A Survey of Quantization Methods for Efficient Neural Network Inference

AGAmir Gholami SKSehoon Kim ZDZhen Dong ZYZhewei Yao MWMichael W. Mahoney

University of California, Berkeley

Indexed incrossref

Abstract

This chapter provides approaches to the problem of quantizing the numerical values in deep Neural Network computations, covering the advantages/disadvantages of current methods. Over the past decade, people have observed significant improvements in the accuracy of Neural Networks (NNs) for a wide range of problems, often achieved by highly over-parameterized models. Achieving efficient, real-time NNs with optimal accuracy requires rethinking the design, training, and deployment of NN models. Model distillation involves training a large model and then using it as a teacher to train a more compact model. Loosely related to NN quantization is work in neuroscience that suggests that the human brain stores…

Citation impact

1,021

total citations

FWCI: 125.75
Percentile: 100%
References: 421

Citations per year

Authors

6

Topics & keywords

Topics

Keywords

Artificial neural network
Inference
Computer science
Quantization (signal processing)
Artificial intelligence
Algorithm

UN Sustainable Development Goals

Quality Education

No related works found for this paper.