Deploying state-of-the-art CNNs requires power-hungry processors and off-chip memory. This precludes the implementation of CNNs in low-power embedded systems. Recent research shows CNNs sustain extreme quantization, binarizing their weights and intermediate feature maps, thereby saving 8-32x memory and collapsing energy-intensive sum-of-products into XNOR-and-popcount operations. We present XNORBIN, a flexible accelerator for binary CNNs with computation tightly coupled to memory for aggressive data reuse supporting even non-trivial network topologies with large feature map volumes. Implemented in UMC 65nm technology XNORBIN achieves an energy efficiency of 95 TOp/s/W and an area efficiency of 2.0TOp/s/MGE at 0.8 V.
XNORBIN: A 95 TOp/s/W hardware accelerator for binary convolutional neural networks
Benini, Luca
2018
Abstract
Deploying state-of-the-art CNNs requires power-hungry processors and off-chip memory. This precludes the implementation of CNNs in low-power embedded systems. Recent research shows CNNs sustain extreme quantization, binarizing their weights and intermediate feature maps, thereby saving 8-32x memory and collapsing energy-intensive sum-of-products into XNOR-and-popcount operations. We present XNORBIN, a flexible accelerator for binary CNNs with computation tightly coupled to memory for aggressive data reuse supporting even non-trivial network topologies with large feature map volumes. Implemented in UMC 65nm technology XNORBIN achieves an energy efficiency of 95 TOp/s/W and an area efficiency of 2.0TOp/s/MGE at 0.8 V.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.