Deploying state-of-the-art CNNs requires power-hungry processors and off-chip memory. This precludes the implementation of CNNs in low-power embedded systems. Recent research shows CNNs sustain extreme quantization, binarizing their weights and intermediate feature maps, thereby saving 8-32x memory and collapsing energy-intensive sum-of-products into XNOR-and-popcount operations. We present XNORBIN, a flexible accelerator for binary CNNs with computation tightly coupled to memory for aggressive data reuse supporting even non-trivial network topologies with large feature map volumes. Implemented in UMC 65nm technology XNORBIN achieves an energy efficiency of 95 TOp/s/W and an area efficiency of 2.0TOp/s/MGE at 0.8 V.
Bahou, A.A., Karunaratne, G., Andri, R., Cavigelli, L., Benini, L. (2018). XNORBIN: A 95 TOp/s/W hardware accelerator for binary convolutional neural networks. Institute of Electrical and Electronics Engineers Inc. [10.1109/CoolChips.2018.8373076].
XNORBIN: A 95 TOp/s/W hardware accelerator for binary convolutional neural networks
Benini, Luca
2018
Abstract
Deploying state-of-the-art CNNs requires power-hungry processors and off-chip memory. This precludes the implementation of CNNs in low-power embedded systems. Recent research shows CNNs sustain extreme quantization, binarizing their weights and intermediate feature maps, thereby saving 8-32x memory and collapsing energy-intensive sum-of-products into XNOR-and-popcount operations. We present XNORBIN, a flexible accelerator for binary CNNs with computation tightly coupled to memory for aggressive data reuse supporting even non-trivial network topologies with large feature map volumes. Implemented in UMC 65nm technology XNORBIN achieves an energy efficiency of 95 TOp/s/W and an area efficiency of 2.0TOp/s/MGE at 0.8 V.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.