Curbing the roofline: a scalable and flexible architecture for CNNs on FPGA

Meloni, Paolo; Deriu, Gianfranco; Conti, Francesco; Loi, Igor; Raffo, Luigi; Benini, Luca

doi:10.1145/2903150.2911715

Convolutional Neural Networks (CNNs) have reached outstanding results in several complex visual recognition tasks, such as classification and scene parsing. CNNs are composed of multiple filtering layers that perform 2D convolutions over input images. The intrinsic parallelism in such a computation kernel makes it suitable to be effectively accelerated on parallel hardware. In this paper we propose a highly flexible and scalable architectural template for acceleration of CNNs on FPGA devices, based on the cooperation between a set of software cores and a parallel convolution engine that communicate via a tightly coupled L1 shared scratchpad. Our accelerator structure, tested on a Xilinx Zynq XC-Z7045 device, delivers peak performance up to 80 GMAC/s, corresponding to 100 MMAC/s for each DSP slice in the programmable fabric. Thanks to the flexible architecture, convolution operations can be scheduled in order to reduce input/output bandwidth down to 8 bytes per cycle without degrading the performance of the accelerator in most of the meaningful use-cases.

Meloni, P., Deriu, G., Conti, F., Loi, I., Raffo, L., Benini, L. (2016). Curbing the roofline: a scalable and flexible architecture for CNNs on FPGA. ACM [10.1145/2903150.2911715].

Curbing the roofline: a scalable and flexible architecture for CNNs on FPGA

Meloni, Paolo;Deriu, Gianfranco;CONTI, FRANCESCO;LOI, IGOR;Raffo, Luigi;BENINI, LUCA

2016

Abstract

Convolutional Neural Networks (CNNs) have reached outstanding results in several complex visual recognition tasks, such as classification and scene parsing. CNNs are composed of multiple filtering layers that perform 2D convolutions over input images. The intrinsic parallelism in such a computation kernel makes it suitable to be effectively accelerated on parallel hardware. In this paper we propose a highly flexible and scalable architectural template for acceleration of CNNs on FPGA devices, based on the cooperation between a set of software cores and a parallel convolution engine that communicate via a tightly coupled L1 shared scratchpad. Our accelerator structure, tested on a Xilinx Zynq XC-Z7045 device, delivers peak performance up to 80 GMAC/s, corresponding to 100 MMAC/s for each DSP slice in the programmable fabric. Thanks to the flexible architecture, convolution operations can be scheduled in order to reduce input/output bandwidth down to 8 bytes per cycle without degrading the performance of the accelerator in most of the meaningful use-cases.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2016
			
	Titolo del volume
	
				CF'16 Proceedings of the ACM International Conference on Computing Frontiers
			
	Pagina iniziale
	
				376
			
	Pagina finale
	
				383
			
	Codice DOI
	
				https://dx.doi.org/10.1145/2903150.2911715
			
	Citazione
	
				Meloni, P., Deriu, G., Conti, F., Loi, I., Raffo, L., Benini, L. (2016). Curbing the roofline: a scalable and flexible architecture for CNNs on FPGA. ACM [10.1145/2903150.2911715].
			
	Tutti gli autori
	
						Meloni, Paolo; Deriu, Gianfranco; Conti, Francesco; Loi, Igor; Raffo, Luigi; Benini, Luca
					
	Appare nelle tipologie:
	
				4.01 Contributo in Atti di convegno

File in questo prodotto:

Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/572732

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

14

12

ND

CRIS Current Research Information System