We present a 3.1 POp/s/W fully digital hardware accelerator for ternary neural networks (TNNs). CUTIE, the completely unrolled ternary inference engine, focuses on minimizing noncomputational energy and switching activity so that dynamic power spent on storing (locally or globally) intermediate results is minimized. This is achieved by: 1) a data-path architecture completely unrolled in the feature map and filter dimensions to reduce switching activity by favoring silencing over iterative computation and maximizing data reuse; 2) targeting TNNs which, in contrast to binary NNs, allow for sparse weights that reduce switching activity; and 3) introducing an optimized training method for higher sparsity of the filter weights, resulting in a further reduction of the switching activity. Compared with state-of-the-art accelerators, CUTIE achieves greater or equal accuracy while decreasing the overall core inference energy cost by a factor of 4.8x-21x.

CUTIE: Beyond PetaOp/s/W Ternary DNN Inference Acceleration with Better-than-Binary Energy Efficiency / Moritz Scherer; Georg Rutishauser; Lukas Cavigelli; Luca Benini. - In: IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS. - ISSN 0278-0070. - ELETTRONICO. - 41:4(2022), pp. 1020-1033. [10.1109/tcad.2021.3075420]

CUTIE: Beyond PetaOp/s/W Ternary DNN Inference Acceleration with Better-than-Binary Energy Efficiency

Luca Benini
2022

Abstract

We present a 3.1 POp/s/W fully digital hardware accelerator for ternary neural networks (TNNs). CUTIE, the completely unrolled ternary inference engine, focuses on minimizing noncomputational energy and switching activity so that dynamic power spent on storing (locally or globally) intermediate results is minimized. This is achieved by: 1) a data-path architecture completely unrolled in the feature map and filter dimensions to reduce switching activity by favoring silencing over iterative computation and maximizing data reuse; 2) targeting TNNs which, in contrast to binary NNs, allow for sparse weights that reduce switching activity; and 3) introducing an optimized training method for higher sparsity of the filter weights, resulting in a further reduction of the switching activity. Compared with state-of-the-art accelerators, CUTIE achieves greater or equal accuracy while decreasing the overall core inference energy cost by a factor of 4.8x-21x.
2022
CUTIE: Beyond PetaOp/s/W Ternary DNN Inference Acceleration with Better-than-Binary Energy Efficiency / Moritz Scherer; Georg Rutishauser; Lukas Cavigelli; Luca Benini. - In: IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS. - ISSN 0278-0070. - ELETTRONICO. - 41:4(2022), pp. 1020-1033. [10.1109/tcad.2021.3075420]
Moritz Scherer; Georg Rutishauser; Lukas Cavigelli; Luca Benini
File in questo prodotto:
File Dimensione Formato  
CUTIE_Beyond_PetaOp_s_W_Ternary_DNN_Inference_Acceleration_With_Better-Than-Binary_Energy_Efficiency.pdf

accesso riservato

Descrizione: versione editoriale
Tipo: Versione (PDF) editoriale
Licenza: Licenza per accesso riservato
Dimensione 1.93 MB
Formato Adobe PDF
1.93 MB Adobe PDF   Visualizza/Apri   Contatta l'autore
CUTIE Beyond PetaOp_aam.pdf

Open Access dal 04/10/2022

Descrizione: post print
Tipo: Postprint
Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione - Non commerciale (CCBYNC)
Dimensione 3.97 MB
Formato Adobe PDF
3.97 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/905381
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 7
  • ???jsp.display-item.citation.isi??? 8
social impact