In-Memory Acceleration (IMA) promises major efficiency improvements in deep neural network (DNN) inference, but challenges remain in the integration of IMA within a digital system. We propose a heterogeneous architecture coupling 8 RISC-V cores with an IMA in a shared-memory cluster, analyzing the benefits and trade-offs of in-memory computing on the realistic use case of a MobileNetV2 bottleneck layer. We explore several IMA integration strategies, analyzing performance, area, and energy efficiency. We show that while pointwise layers achieve significant speed-ups over software implementation, on depthwise layer the inability to efficiently map parameters on the accelerator leads to a significant trade-off between throughput and area. We propose a hybrid solution where pointwise convolutions are executed on IMA while depthwise on the cluster cores, achieving a speed-up of 3x over SW execution while saving 50% of area when compared to an all-in IMA solution with similar performance.

Ottavi G., Karunaratne G., Conti F., Boybat I., Benini L., Rossi D. (2021). End-To-end 100-TOPS/W Inference with Analog In-Memory Computing: Are We There Yet?. New York : Institute of Electrical and Electronics Engineers Inc. [10.1109/AICAS51828.2021.9458409].

End-To-end 100-TOPS/W Inference with Analog In-Memory Computing: Are We There Yet?

Ottavi G.
;
Conti F.;Benini L.;Rossi D.
2021

Abstract

In-Memory Acceleration (IMA) promises major efficiency improvements in deep neural network (DNN) inference, but challenges remain in the integration of IMA within a digital system. We propose a heterogeneous architecture coupling 8 RISC-V cores with an IMA in a shared-memory cluster, analyzing the benefits and trade-offs of in-memory computing on the realistic use case of a MobileNetV2 bottleneck layer. We explore several IMA integration strategies, analyzing performance, area, and energy efficiency. We show that while pointwise layers achieve significant speed-ups over software implementation, on depthwise layer the inability to efficiently map parameters on the accelerator leads to a significant trade-off between throughput and area. We propose a hybrid solution where pointwise convolutions are executed on IMA while depthwise on the cluster cores, achieving a speed-up of 3x over SW execution while saving 50% of area when compared to an all-in IMA solution with similar performance.
2021
2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems, AICAS 2021
1
4
Ottavi G., Karunaratne G., Conti F., Boybat I., Benini L., Rossi D. (2021). End-To-end 100-TOPS/W Inference with Analog In-Memory Computing: Are We There Yet?. New York : Institute of Electrical and Electronics Engineers Inc. [10.1109/AICAS51828.2021.9458409].
Ottavi G.; Karunaratne G.; Conti F.; Boybat I.; Benini L.; Rossi D.
File in questo prodotto:
File Dimensione Formato  
endtoend_imc_redux.pdf

Open Access dal 10/12/2022

Tipo: Postprint
Licenza: Licenza per accesso libero gratuito
Dimensione 541.23 kB
Formato Adobe PDF
541.23 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/847009
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 0
social impact