articleACM SIGARCH Computer Architecture NewsJun 24, 2017BRONZE OA

In-Datacenter Performance Analysis of a Tensor Processing Unit

Google (United States)

Indexed incrossref

Abstract

Many architects believe that major improvements in cost-energy-performance must now come from domain-specific hardware. This paper evaluates a custom ASIC---called a Tensor Processing Unit (TPU) --- deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN). The heart of the TPU is a 65,536 8-bit MAC matrix multiply unit that offers a peak throughput of 92 TeraOps/second (TOPS) and a large (28 MiB) software-managed on-chip memory. The TPU's deterministic execution model is a better match to the 99th-percentile response-time requirement of our NN applications than are the time-varying optimizations of CPUs and GPUs that help average throughput more than guaranteed latency.…

Citation impact

1,304
total citations
FWCI
151.83
Percentile
100%
References
43
Citations per year

Authors

76

Topics & keywords

Keywords
  • Computer science
  • Central processing unit
  • Parallel computing
  • Application-specific integrated circuit
  • Throughput
  • Embedded system
  • Computer hardware
  • Operating system
UN Sustainable Development Goals
  • Affordable and clean energy
No related works found for this paper.