Heterogeneous Embedded Neural Processing Units Utilizing PCM-Based Analog In-Memory Computing

Boybat, I.; Boesch, T.; Allegra, M.; Baldo, M.; Bertolini-Agnoletto, J. J.; Burr, G. W.; Buschini, A.; Cabrini, A.; Calvetti, E.; Cappetta, C.; Conti, F.; Ferro, E.; Franchi Scarselli, E.; Garofalo, A.; Girardi, F.; Islamoglu, G.; Jonnalagadda, V. P.; Karunaratne, G.; Lammie, C.; Le Gallo, M.; Li, C.; Massa, R.; Ornstein, A. C.; Pang, H.; Pasotti, M.; Rajendran, B.; Redaelli, A.; Sanli, I.; Simon, W. A.; Singh, A.; Singh, S. -P.; Urlini, G.; Vasilopoulos, A.; Zurla, R.; Desoli, G.; Sebastian, A.

doi:10.1109/iedm50854.2024.10873479

We propose an embedded Neural Processing Unit (NPU) architecture for deep learning inference to address the stringent energy, area, and cost requirements of edge AI. This heterogeneous architecture integrates a variety of digital and analog accelerator nodes to cater to diverse operation types and precision requirements. To achieve high energy efficiency while maintaining substantial non-volatile on-chip weight capacity, we utilize Analog In-Memory Computing (AIMC) tiles based on Phase-Change Memory (PCM) for Matrix-Vector Multiplications (MVMs). Additionally, a digital data path and a programmable software cluster facilitate end-to-end inference across multiple precision levels. The NPU is projected to deliver competitive throughput for transformer Neural Networks (NNs), rivaling high-end System-on-Chips (SoCs) for mobile devices and edge accelerators fabricated at more advanced technology nodes.

Boybat, I., Boesch, T., Allegra, M., Baldo, M., Bertolini-Agnoletto, J.J., Burr, G.W., et al. (2024). Heterogeneous Embedded Neural Processing Units Utilizing PCM-Based Analog In-Memory Computing. IEEE [10.1109/iedm50854.2024.10873479].