The spread of deep learning on embedded devices has prompted the development of numerous methods to optimize the deployment of deep neural networks (DNNs). Works have mainly focused on: 1) efficient DNN architectures; 2) network optimization techniques, such as pruning and quantization; 3) optimized algorithms to speed up the execution of the most computational intensive layers; and 4) dedicated hardware to accelerate the data flow and computation. However, there is a lack of research on cross-level optimization as the space of approaches becomes too large to test and obtain a globally optimized solution. Thus, leading to suboptimal deployment in terms of latency, accuracy, and memory. In this work, we first detail and analyze the methods to improve the deployment of DNNs across the different levels of software optimization. Building on this knowledge, we present an automated exploration framework to ease the deployment of DNNs. The framework relies on a reinforcement learning search that, combined with a deep learning inference framework, automatically explores the design space and learns an optimized solution that speeds up the performance and reduces the memory on embedded CPU platforms. Thus, we present a set of results for state-of-the-art DNNs on a range of Arm Cortex-A CPU platforms achieving up to $4 imes $ improvement in performance and over $2 imes $ reduction in memory with negligible loss in accuracy with respect to the BLAS floating-point implementation.

De Prado M., Mundy A., Saeed R., Denna M., Pazos N., Benini L. (2021). Automated Design Space Exploration for Optimized Deployment of DNN on Arm Cortex-A CPUs. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 40(11), 2293-2305 [10.1109/TCAD.2020.3046568].

Automated Design Space Exploration for Optimized Deployment of DNN on Arm Cortex-A CPUs

Benini L.
2021

Abstract

The spread of deep learning on embedded devices has prompted the development of numerous methods to optimize the deployment of deep neural networks (DNNs). Works have mainly focused on: 1) efficient DNN architectures; 2) network optimization techniques, such as pruning and quantization; 3) optimized algorithms to speed up the execution of the most computational intensive layers; and 4) dedicated hardware to accelerate the data flow and computation. However, there is a lack of research on cross-level optimization as the space of approaches becomes too large to test and obtain a globally optimized solution. Thus, leading to suboptimal deployment in terms of latency, accuracy, and memory. In this work, we first detail and analyze the methods to improve the deployment of DNNs across the different levels of software optimization. Building on this knowledge, we present an automated exploration framework to ease the deployment of DNNs. The framework relies on a reinforcement learning search that, combined with a deep learning inference framework, automatically explores the design space and learns an optimized solution that speeds up the performance and reduces the memory on embedded CPU platforms. Thus, we present a set of results for state-of-the-art DNNs on a range of Arm Cortex-A CPU platforms achieving up to $4 imes $ improvement in performance and over $2 imes $ reduction in memory with negligible loss in accuracy with respect to the BLAS floating-point implementation.
2021
De Prado M., Mundy A., Saeed R., Denna M., Pazos N., Benini L. (2021). Automated Design Space Exploration for Optimized Deployment of DNN on Arm Cortex-A CPUs. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 40(11), 2293-2305 [10.1109/TCAD.2020.3046568].
De Prado M.; Mundy A.; Saeed R.; Denna M.; Pazos N.; Benini L.
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/869628
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 8
  • ???jsp.display-item.citation.isi??? 8
social impact