CRIS Current Research Information System

IoT endnodes often couple a small and fast L1 scratchpad memory with higher-capacity but lower bandwidth and speed L2 background memory. The absence of a coherent hardware cache hierarchy saves energy but comes at the cost of labor-intensive explicit memory management, complicating the deployment of algorithms with large data memory footprint, such as Deep Neural Network (DNN) inference. In this work, we present DORY, a lightweight software-cache dedicated to DNN Deployment Oriented to memoRY. DORY leverages static data tiling and DMA-based double buffering to hide the complexity of manual L1-L2 memory traffic management. DORY enables storage of activations and weights in L2 with less than 4% performance overhead with respect to direct execution in L1. We show that a 142 kB DNN achieving 79.9% on CIFAR-10 runs 3.2× faster compared to its execution directly from L2 memory while consuming 1.9× less energy.

Burrello A., Conti F., Garofalo A., Rossi D., Benini L. (2019). Work-in-progress: Dory: Lightweight memory hierarchy management for deep NN inference on iot endnodes. Association for Computing Machinery, Inc [10.1145/3349567.3351726].

Work-in-progress: Dory: Lightweight memory hierarchy management for deep NN inference on iot endnodes

Burrello A.;Conti F.;Garofalo A.;Rossi D.;Benini L.

2019

Abstract

IoT endnodes often couple a small and fast L1 scratchpad memory with higher-capacity but lower bandwidth and speed L2 background memory. The absence of a coherent hardware cache hierarchy saves energy but comes at the cost of labor-intensive explicit memory management, complicating the deployment of algorithms with large data memory footprint, such as Deep Neural Network (DNN) inference. In this work, we present DORY, a lightweight software-cache dedicated to DNN Deployment Oriented to memoRY. DORY leverages static data tiling and DMA-based double buffering to hide the complexity of manual L1-L2 memory traffic management. DORY enables storage of activations and weights in L2 with less than 4% performance overhead with respect to direct execution in L1. We show that a 142 kB DNN achieving 79.9% on CIFAR-10 runs 3.2× faster compared to its execution directly from L2 memory while consuming 1.9× less energy.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2019
			
	Titolo del volume
	
				Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis Companion, CODES/ISSS 2019
			
	Pagina iniziale
	
				1
			
	Pagina finale
	
				2
			
	Codice DOI
	
				https://dx.doi.org/10.1145/3349567.3351726
			
	Citazione
	
				Burrello A.,  Conti F.,  Garofalo A.,  Rossi D.,  Benini L. (2019). Work-in-progress: Dory: Lightweight memory hierarchy management for deep NN inference on iot endnodes. Association for Computing Machinery, Inc [10.1145/3349567.3351726].
			
	Tutti gli autori
	
						Burrello A.; Conti F.; Garofalo A.; Rossi D.; Benini L.
					
	Appare nelle tipologie:
	
				4.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
esweek_dory_postprint.pdf accesso aperto Tipo: Versione (PDF) editoriale Licenza: Licenza per accesso libero gratuito Dimensione 2.06 MB Formato Adobe PDF Visualizza/Apri	2.06 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/730310

Citazioni

ND

11

7

social impact