CRIS Current Research Information System

In-Memory Acceleration (IMA) promises major efficiency improvements in deep neural network (DNN) inference, but challenges remain in the integration of IMA within a digital system. We propose a heterogeneous architecture coupling 8 RISC-V cores with an IMA in a shared-memory cluster, analyzing the benefits and trade-offs of in-memory computing on the realistic use case of a MobileNetV2 bottleneck layer. We explore several IMA integration strategies, analyzing performance, area, and energy efficiency. We show that while pointwise layers achieve significant speed-ups over software implementation, on depthwise layer the inability to efficiently map parameters on the accelerator leads to a significant trade-off between throughput and area. We propose a hybrid solution where pointwise convolutions are executed on IMA while depthwise on the cluster cores, achieving a speed-up of 3x over SW execution while saving 50% of area when compared to an all-in IMA solution with similar performance.

Ottavi G., Karunaratne G., Conti F., Boybat I., Benini L., Rossi D. (2021). End-To-end 100-TOPS/W Inference with Analog In-Memory Computing: Are We There Yet?. New York : Institute of Electrical and Electronics Engineers Inc. [10.1109/AICAS51828.2021.9458409].

End-To-end 100-TOPS/W Inference with Analog In-Memory Computing: Are We There Yet?

Ottavi G.;Karunaratne G.;Conti F.;Boybat I.;Benini L.;Rossi D.

2021

Abstract

In-Memory Acceleration (IMA) promises major efficiency improvements in deep neural network (DNN) inference, but challenges remain in the integration of IMA within a digital system. We propose a heterogeneous architecture coupling 8 RISC-V cores with an IMA in a shared-memory cluster, analyzing the benefits and trade-offs of in-memory computing on the realistic use case of a MobileNetV2 bottleneck layer. We explore several IMA integration strategies, analyzing performance, area, and energy efficiency. We show that while pointwise layers achieve significant speed-ups over software implementation, on depthwise layer the inability to efficiently map parameters on the accelerator leads to a significant trade-off between throughput and area. We propose a hybrid solution where pointwise convolutions are executed on IMA while depthwise on the cluster cores, achieving a speed-up of 3x over SW execution while saving 50% of area when compared to an all-in IMA solution with similar performance.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2021
			
	Titolo del volume
	
				2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems, AICAS 2021
			
	Pagina iniziale
	
				1
			
	Pagina finale
	
				4
			
	Codice DOI
	
				https://dx.doi.org/10.1109/AICAS51828.2021.9458409
			
	Citazione
	
				Ottavi G.,  Karunaratne G.,  Conti F.,  Boybat I.,  Benini L.,  Rossi D. (2021). End-To-end 100-TOPS/W Inference with Analog In-Memory Computing: Are We There Yet?. New York : Institute of Electrical and Electronics Engineers Inc. [10.1109/AICAS51828.2021.9458409].
			
	Tutti gli autori
	
						Ottavi G.; Karunaratne G.; Conti F.; Boybat I.; Benini L.; Rossi D.
					
	Appare nelle tipologie:
	
				4.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
endtoend_imc_redux.pdf Open Access dal 10/12/2022 Tipo: Postprint Licenza: Licenza per accesso libero gratuito Dimensione 541.23 kB Formato Adobe PDF Visualizza/Apri	541.23 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/847009

Citazioni

ND

2

0

social impact