Abstract

State-of-the-art deep neural networks (DNNs) have hundreds of millions of connections and are both computationally and memory intensive, making them difficult to deploy on embedded systems with limited hardware resources and power budgets. While custom hardware helps the computation, fetching weights from DRAM is two orders of magnitude more expensive than ALU operations, and dominates the required power. Previously proposed 'Deep Compression' makes it possible to fit large DNNs (AlexNet and VGGNet) fully in on-chip SRAM. This compression is achieved by pruning the redundant connections and having multiple connections share the same weight. We propose an energy efficient inference engine (EIE) that performs…

Citation impact

2,031
total citations
FWCI
132.78
Percentile
100%
References
55
Citations per year

Authors

7

Topics & keywords

Keywords
  • Computer science
  • Uncompressed video
  • Parallel computing
  • Dram
  • Static random-access memory
  • Matrix multiplication
  • Throughput
  • Computer engineering
UN Sustainable Development Goals
  • Affordable and clean energy
No related works found for this paper.