articleJun 15, 2017Closed access
In-Datacenter Performance Analysis of a Tensor Processing Unit
Indexed incrossref
Abstract
Many architects believe that major improvements in cost-energy-performance must now come from domain-specific hardware. This paper evaluates a custom ASIC---called a Tensor Processing Unit (TPU) --- deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN). The heart of the TPU is a 65,536 8-bit MAC matrix multiply unit that offers a peak throughput of 92 TeraOps/second (TOPS) and a large (28 MiB) software-managed on-chip memory. The TPU's deterministic execution model is a better match to the 99th-percentile response-time requirement of our NN applications than are the time-varying optimizations of CPUs and GPUs that help average throughput more than guaranteed latency.…
Citation impact
4,389
total citations
- FWCI
- 406.97
- Percentile
- 100%
- References
- 56
Citations per year
Authors
76Topics & keywords
Topics
Keywords
- Computer science
- Central processing unit
- Application-specific integrated circuit
- Parallel computing
- Throughput
- Embedded system
- Computer hardware
- Operating system
UN Sustainable Development Goals
- Affordable and clean energy
No related works found for this paper.