Deep Learning is increasingly being adopted by industry for computer vision applications running on embedded devices. While Convolutional Neural Networks' accuracy has achieved a mature and remarkable state, inference latency and throughput are a major concern especially when targeting low-cost and low-power embedded platforms. CNNs' inference latency may become a bottleneck for Deep Learning adoption by industry, as it is a crucial specification for many real-time processes. Furthermore, deployment of CNNs across heterogeneous platforms presents major compatibility issues due to vendor-specific technology and acceleration libraries.In this work, we present QS-DNN, a fully automatic search based on Reinforcement Learning which, combined with an inference engine optimizer, efficiently explores through the design space and empirically finds the optimal combinations of libraries and primitives to speed up the inference of CNNs on heterogeneous embedded devices. We show that, an optimized combination can achieve 45x speedup in inference latency on CPU compared to a dependency-free baseline and 2x on average on GPGPU compared to the best vendor library. Further, we demonstrate that, the quality of results and time "to-solution" is much better than with Random Search and achieves up to 15x better results for a short-time search.

Prado M.D., Pazos N., Benini L. (2019). Learning to infer: RL-based search for DNN primitive selection on Heterogeneous Embedded Systems. New York : Institute of Electrical and Electronics Engineers Inc. [10.23919/DATE.2019.8714959].

Learning to infer: RL-based search for DNN primitive selection on Heterogeneous Embedded Systems

Benini L.
2019

Abstract

Deep Learning is increasingly being adopted by industry for computer vision applications running on embedded devices. While Convolutional Neural Networks' accuracy has achieved a mature and remarkable state, inference latency and throughput are a major concern especially when targeting low-cost and low-power embedded platforms. CNNs' inference latency may become a bottleneck for Deep Learning adoption by industry, as it is a crucial specification for many real-time processes. Furthermore, deployment of CNNs across heterogeneous platforms presents major compatibility issues due to vendor-specific technology and acceleration libraries.In this work, we present QS-DNN, a fully automatic search based on Reinforcement Learning which, combined with an inference engine optimizer, efficiently explores through the design space and empirically finds the optimal combinations of libraries and primitives to speed up the inference of CNNs on heterogeneous embedded devices. We show that, an optimized combination can achieve 45x speedup in inference latency on CPU compared to a dependency-free baseline and 2x on average on GPGPU compared to the best vendor library. Further, we demonstrate that, the quality of results and time "to-solution" is much better than with Random Search and achieves up to 15x better results for a short-time search.
2019
Proceedings of the 2019 Design, Automation and Test in Europe Conference and Exhibition, DATE 2019
1409
1414
Prado M.D., Pazos N., Benini L. (2019). Learning to infer: RL-based search for DNN primitive selection on Heterogeneous Embedded Systems. New York : Institute of Electrical and Electronics Engineers Inc. [10.23919/DATE.2019.8714959].
Prado M.D.; Pazos N.; Benini L.
File in questo prodotto:
File Dimensione Formato  
Learning to infer.pdf

accesso riservato

Descrizione: Versione Editoriale
Tipo: Versione (PDF) editoriale / Version Of Record
Licenza: Licenza per accesso riservato
Dimensione 301.8 kB
Formato Adobe PDF
301.8 kB Adobe PDF   Visualizza/Apri   Contatta l'autore
Learning to infer post print.pdf

Open Access dal 16/11/2019

Tipo: Postprint / Author's Accepted Manuscript (AAM) - versione accettata per la pubblicazione dopo la peer-review
Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY)
Dimensione 437.23 kB
Formato Adobe PDF
437.23 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/729737
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 10
  • ???jsp.display-item.citation.isi??? 9
social impact