book chapterJan 12, 2022GOLD OA

A Survey of Quantization Methods for Efficient Neural Network Inference

University of California, Berkeley

Indexed incrossref

Abstract

This chapter provides approaches to the problem of quantizing the numerical values in deep Neural Network computations, covering the advantages/disadvantages of current methods. Over the past decade, people have observed significant improvements in the accuracy of Neural Networks (NNs) for a wide range of problems, often achieved by highly over-parameterized models. Achieving efficient, real-time NNs with optimal accuracy requires rethinking the design, training, and deployment of NN models. Model distillation involves training a large model and then using it as a teacher to train a more compact model. Loosely related to NN quantization is work in neuroscience that suggests that the human brain stores…

Citation impact

1,021
total citations
FWCI
125.75
Percentile
100%
References
421
Citations per year

Authors

6

Topics & keywords

Keywords
  • Artificial neural network
  • Inference
  • Computer science
  • Quantization (signal processing)
  • Artificial intelligence
  • Algorithm
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.

Funding