Curbing the roofline: a scalable and flexible architecture for CNNs on FPGA

Meloni, Paolo; Deriu, Gianfranco; Conti, Francesco; Loi, Igor; Raffo, Luigi; Benini, Luca

doi:10.1145/2903150.2911715

Convolutional Neural Networks (CNNs) have reached outstanding results in several complex visual recognition tasks, such as classification and scene parsing. CNNs are composed of multiple filtering layers that perform 2D convolutions over input images. The intrinsic parallelism in such a computation kernel makes it suitable to be effectively accelerated on parallel hardware. In this paper we propose a highly flexible and scalable architectural template for acceleration of CNNs on FPGA devices, based on the cooperation between a set of software cores and a parallel convolution engine that communicate via a tightly coupled L1 shared scratchpad. Our accelerator structure, tested on a Xilinx Zynq XC-Z7045 device, delivers peak performance up to 80 GMAC/s, corresponding to 100 MMAC/s for each DSP slice in the programmable fabric. Thanks to the flexible architecture, convolution operations can be scheduled in order to reduce input/output bandwidth down to 8 bytes per cycle without degrading the performance of the accelerator in most of the meaningful use-cases.

Curbing the roofline: a scalable and flexible architecture for CNNs on FPGA / Meloni, Paolo; Deriu, Gianfranco; Conti, Francesco; Loi, Igor; Raffo, Luigi; Benini, Luca. - ELETTRONICO. - (2016), pp. 376-383. (Intervento presentato al convegno 2016 ACM International Conference on Computing Frontiers tenutosi a Como, Italia nel 16-19 Maggio 2016) [10.1145/2903150.2911715].

Curbing the roofline: a scalable and flexible architecture for CNNs on FPGA

Meloni, Paolo;Deriu, Gianfranco;CONTI, FRANCESCO;LOI, IGOR;Raffo, Luigi;BENINI, LUCA

2016

Abstract

Convolutional Neural Networks (CNNs) have reached outstanding results in several complex visual recognition tasks, such as classification and scene parsing. CNNs are composed of multiple filtering layers that perform 2D convolutions over input images. The intrinsic parallelism in such a computation kernel makes it suitable to be effectively accelerated on parallel hardware. In this paper we propose a highly flexible and scalable architectural template for acceleration of CNNs on FPGA devices, based on the cooperation between a set of software cores and a parallel convolution engine that communicate via a tightly coupled L1 shared scratchpad. Our accelerator structure, tested on a Xilinx Zynq XC-Z7045 device, delivers peak performance up to 80 GMAC/s, corresponding to 100 MMAC/s for each DSP slice in the programmable fabric. Thanks to the flexible architecture, convolution operations can be scheduled in order to reduce input/output bandwidth down to 8 bytes per cycle without degrading the performance of the accelerator in most of the meaningful use-cases.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
			2016
		
	Titolo del volume
	
			CF'16 Proceedings of the ACM International Conference on Computing Frontiers
		
	Pagina iniziale
	
			376
		
	Pagina finale
	
			383
		
	Codice DOI
	
			https://dx.doi.org/10.1145/2903150.2911715
		
	Citazione
	
			Curbing the roofline: a scalable and flexible architecture for CNNs on FPGA / Meloni, Paolo; Deriu, Gianfranco; Conti, Francesco; Loi, Igor; Raffo, Luigi; Benini, Luca. - ELETTRONICO. - (2016), pp. 376-383. (Intervento presentato al  convegno 2016 ACM International Conference on Computing Frontiers tenutosi a Como, Italia nel 16-19 Maggio 2016) [10.1145/2903150.2911715].
		
	Tutti gli autori
	
			Meloni, Paolo; Deriu, Gianfranco; Conti, Francesco; Loi, Igor; Raffo, Luigi; Benini, Luca
		
	Appare nelle tipologie:
	
			4.01 Contributo in Atti di convegno

File in questo prodotto:

Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/572732

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

14

12

CRIS Current Research Information System