We present a 3.1 POp/s/W fully digital hardware accelerator for ternary neural networks (TNNs). CUTIE, the completely unrolled ternary inference engine, focuses on minimizing noncomputational energy and switching activity so that dynamic power spent on storing (locally or globally) intermediate results is minimized. This is achieved by: 1) a data-path architecture completely unrolled in the feature map and filter dimensions to reduce switching activity by favoring silencing over iterative computation and maximizing data reuse; 2) targeting TNNs which, in contrast to binary NNs, allow for sparse weights that reduce switching activity; and 3) introducing an optimized training method for higher sparsity of the filter weights, resulting in a further reduction of the switching activity. Compared with state-of-the-art accelerators, CUTIE achieves greater or equal accuracy while decreasing the overall core inference energy cost by a factor of 4.8x-21x.
Moritz Scherer, Georg Rutishauser, Lukas Cavigelli, Luca Benini (2022). CUTIE: Beyond PetaOp/s/W Ternary DNN Inference Acceleration with Better-than-Binary Energy Efficiency. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 41(4), 1020-1033 [10.1109/tcad.2021.3075420].
CUTIE: Beyond PetaOp/s/W Ternary DNN Inference Acceleration with Better-than-Binary Energy Efficiency
Luca Benini
2022
Abstract
We present a 3.1 POp/s/W fully digital hardware accelerator for ternary neural networks (TNNs). CUTIE, the completely unrolled ternary inference engine, focuses on minimizing noncomputational energy and switching activity so that dynamic power spent on storing (locally or globally) intermediate results is minimized. This is achieved by: 1) a data-path architecture completely unrolled in the feature map and filter dimensions to reduce switching activity by favoring silencing over iterative computation and maximizing data reuse; 2) targeting TNNs which, in contrast to binary NNs, allow for sparse weights that reduce switching activity; and 3) introducing an optimized training method for higher sparsity of the filter weights, resulting in a further reduction of the switching activity. Compared with state-of-the-art accelerators, CUTIE achieves greater or equal accuracy while decreasing the overall core inference energy cost by a factor of 4.8x-21x.File | Dimensione | Formato | |
---|---|---|---|
CUTIE_Beyond_PetaOp_s_W_Ternary_DNN_Inference_Acceleration_With_Better-Than-Binary_Energy_Efficiency.pdf
accesso riservato
Descrizione: versione editoriale
Tipo:
Versione (PDF) editoriale
Licenza:
Licenza per accesso riservato
Dimensione
1.93 MB
Formato
Adobe PDF
|
1.93 MB | Adobe PDF | Visualizza/Apri Contatta l'autore |
CUTIE Beyond PetaOp_aam.pdf
Open Access dal 04/10/2022
Descrizione: post print
Tipo:
Postprint
Licenza:
Licenza per Accesso Aperto. Creative Commons Attribuzione - Non commerciale (CCBYNC)
Dimensione
3.97 MB
Formato
Adobe PDF
|
3.97 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.