An 80-Tile Sub-100-W TeraFLOPS Processor in 65-nm CMOS
Intel (United States) · Intel (India)
Abstract
This paper describes an integrated network-on-chip architecture containing 80 tiles arranged as an 8x10 2-D array of floating-point cores and packet-switched routers, both designed to operate at 4 GHz. Each tile has two pipelined single-precision floating-point multiply accumulators (FPMAC) which feature a single-cycle accumulation loop for high throughput. The on-chip 2-D mesh network provides a bisection bandwidth of 2 Terabits/s. The 15-FO4 design employs mesochronous clocking, fine-grained clock gating, dynamic sleep transistors, and body-bias techniques. In a 65-nm eight-metal CMOS process, the 275 mm 2 custom design contains 100 M transistors. The fully functional first silicon achieves over 1.0 TFLOPS…
Citation impact
- FWCI
- 78.14
- Percentile
- 100%
- References
- 20
Authors
15Topics & keywords
- CMOS
- Chip
- Computer science
- Transistor
- Tile
- Computer hardware
- Parallel computing
- Embedded system
- Affordable and clean energy