We propose an embedded Neural Processing Unit (NPU) architecture for deep learning inference to address the stringent energy, area, and cost requirements of edge AI. This heterogeneous architecture integrates a variety of digital and analog accelerator nodes to cater to diverse operation types and precision requirements. To achieve high energy efficiency while maintaining substantial non-volatile on-chip weight capacity, we utilize Analog In-Memory Computing (AIMC) tiles based on Phase-Change Memory (PCM) for Matrix-Vector Multiplications (MVMs). Additionally, a digital data path and a programmable software cluster facilitate end-to-end inference across multiple precision levels. The NPU is projected to deliver competitive throughput for transformer Neural Networks (NNs), rivaling high-end System-on-Chips (SoCs) for mobile devices and edge accelerators fabricated at more advanced technology nodes.

Boybat, I., Boesch, T., Allegra, M., Baldo, M., Bertolini-Agnoletto, J.J., Burr, G.W., et al. (2024). Heterogeneous Embedded Neural Processing Units Utilizing PCM-Based Analog In-Memory Computing. IEEE [10.1109/iedm50854.2024.10873479].

Heterogeneous Embedded Neural Processing Units Utilizing PCM-Based Analog In-Memory Computing

Conti, F.;E. Franchi Scarselli;Garofalo, A.;Redaelli, A.;
2024

Abstract

We propose an embedded Neural Processing Unit (NPU) architecture for deep learning inference to address the stringent energy, area, and cost requirements of edge AI. This heterogeneous architecture integrates a variety of digital and analog accelerator nodes to cater to diverse operation types and precision requirements. To achieve high energy efficiency while maintaining substantial non-volatile on-chip weight capacity, we utilize Analog In-Memory Computing (AIMC) tiles based on Phase-Change Memory (PCM) for Matrix-Vector Multiplications (MVMs). Additionally, a digital data path and a programmable software cluster facilitate end-to-end inference across multiple precision levels. The NPU is projected to deliver competitive throughput for transformer Neural Networks (NNs), rivaling high-end System-on-Chips (SoCs) for mobile devices and edge accelerators fabricated at more advanced technology nodes.
2024
Proceedings 2024 IEEE Electron Devices Meeting (IEDM)
1
4
Boybat, I., Boesch, T., Allegra, M., Baldo, M., Bertolini-Agnoletto, J.J., Burr, G.W., et al. (2024). Heterogeneous Embedded Neural Processing Units Utilizing PCM-Based Analog In-Memory Computing. IEEE [10.1109/iedm50854.2024.10873479].
Boybat, I.; Boesch, T.; Allegra, M.; Baldo, M.; Bertolini-Agnoletto, J. J.; Burr, G. W.; Buschini, A.; Cabrini, A.; Calvetti, E.; Cappetta, C.; Conti,...espandi
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1046050
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact