Convolutional Neural Networks (CNNs) have reached outstanding results in several complex visual recognition tasks, such as classification and scene parsing. CNNs are composed of multiple filtering layers that perform 2D convolutions over input images. The intrinsic parallelism in such a computation kernel makes it suitable to be effectively accelerated on parallel hardware. In this paper we propose a highly flexible and scalable architectural template for acceleration of CNNs on FPGA devices, based on the cooperation between a set of software cores and a parallel convolution engine that communicate via a tightly coupled L1 shared scratchpad. Our accelerator structure, tested on a Xilinx Zynq XC-Z7045 device, delivers peak performance up to 80 GMAC/s, corresponding to 100 MMAC/s for each DSP slice in the programmable fabric. Thanks to the flexible architecture, convolution operations can be scheduled in order to reduce input/output bandwidth down to 8 bytes per cycle without degrading the performance of the accelerator in most of the meaningful use-cases.

Curbing the roofline: a scalable and flexible architecture for CNNs on FPGA / Meloni, Paolo; Deriu, Gianfranco; Conti, Francesco; Loi, Igor; Raffo, Luigi; Benini, Luca. - ELETTRONICO. - (2016), pp. 376-383. (Intervento presentato al convegno 2016 ACM International Conference on Computing Frontiers tenutosi a Como, Italia nel 16-19 Maggio 2016) [10.1145/2903150.2911715].

Curbing the roofline: a scalable and flexible architecture for CNNs on FPGA

CONTI, FRANCESCO;LOI, IGOR;BENINI, LUCA
2016

Abstract

Convolutional Neural Networks (CNNs) have reached outstanding results in several complex visual recognition tasks, such as classification and scene parsing. CNNs are composed of multiple filtering layers that perform 2D convolutions over input images. The intrinsic parallelism in such a computation kernel makes it suitable to be effectively accelerated on parallel hardware. In this paper we propose a highly flexible and scalable architectural template for acceleration of CNNs on FPGA devices, based on the cooperation between a set of software cores and a parallel convolution engine that communicate via a tightly coupled L1 shared scratchpad. Our accelerator structure, tested on a Xilinx Zynq XC-Z7045 device, delivers peak performance up to 80 GMAC/s, corresponding to 100 MMAC/s for each DSP slice in the programmable fabric. Thanks to the flexible architecture, convolution operations can be scheduled in order to reduce input/output bandwidth down to 8 bytes per cycle without degrading the performance of the accelerator in most of the meaningful use-cases.
2016
CF'16 Proceedings of the ACM International Conference on Computing Frontiers
376
383
Curbing the roofline: a scalable and flexible architecture for CNNs on FPGA / Meloni, Paolo; Deriu, Gianfranco; Conti, Francesco; Loi, Igor; Raffo, Luigi; Benini, Luca. - ELETTRONICO. - (2016), pp. 376-383. (Intervento presentato al convegno 2016 ACM International Conference on Computing Frontiers tenutosi a Como, Italia nel 16-19 Maggio 2016) [10.1145/2903150.2911715].
Meloni, Paolo; Deriu, Gianfranco; Conti, Francesco; Loi, Igor; Raffo, Luigi; Benini, Luca
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/572732
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 14
  • ???jsp.display-item.citation.isi??? 12
social impact