CRIS Current Research Information System

Despite its relatively recent history, Deep Learning (DL) based source code analysis is already a cornerstone in machine learning for compiler optimization. When applied to the classification of pieces of code to identify the best computation unit in a heterogeneous Systems-on-Chip, it can be effective in supporting decisions that a programmer has otherwise to take manually. Several techniques have been proposed exploiting different networks and input information, prominently sequence-based and graph-based representations, complemented by auxiliary information typically related to payload and device configuration. While the accuracy of DL methods strongly depends on the training and test datasets, so far no exhaustive and statistically meaningful analysis has been done on its impact on the results and on how to effectively extract the available information. This is relevant also considering the scarce availability of source code datasets that can be labelled by profiling on heterogeneous compute units. In this paper, we first present such study, that leads us to devise the contribution of code sequences and auxiliary inputs separately. Starting from this analysis, we then demonstrate that by using normalization of auxiliary information it is possible to improve state-of-art results in terms of accuracy. Finally, we propose a novel approach exploiting Siamese networks that further improve mapping accuracy by increasing the cardinality of the dataset, thus compensating for its relatively small size.

Emanuele Parisi, F.B. (2021). Making the Most of Scarce Input Data in Deep Learning-based Source Code Classification for Heterogeneous Device Mapping. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 41(6), 1-12 [10.1109/TCAD.2021.3114617].

Making the Most of Scarce Input Data in Deep Learning-based Source Code Classification for Heterogeneous Device Mapping

Emanuele Parisi^Co-primo;Francesco Barchi^Co-primo;Andrea Bartolini;Andrea Acquaviva

2021

Abstract

Despite its relatively recent history, Deep Learning (DL) based source code analysis is already a cornerstone in machine learning for compiler optimization. When applied to the classification of pieces of code to identify the best computation unit in a heterogeneous Systems-on-Chip, it can be effective in supporting decisions that a programmer has otherwise to take manually. Several techniques have been proposed exploiting different networks and input information, prominently sequence-based and graph-based representations, complemented by auxiliary information typically related to payload and device configuration. While the accuracy of DL methods strongly depends on the training and test datasets, so far no exhaustive and statistically meaningful analysis has been done on its impact on the results and on how to effectively extract the available information. This is relevant also considering the scarce availability of source code datasets that can be labelled by profiling on heterogeneous compute units. In this paper, we first present such study, that leads us to devise the contribution of code sequences and auxiliary inputs separately. Starting from this analysis, we then demonstrate that by using normalization of auxiliary information it is possible to improve state-of-art results in terms of accuracy. Finally, we propose a novel approach exploiting Siamese networks that further improve mapping accuracy by increasing the cardinality of the dataset, thus compensating for its relatively small size.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2021
			
	Rivista
	
				IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS
			
	Codice DOI
	
				https://dx.doi.org/10.1109/TCAD.2021.3114617
			
	Citazione
	
				Emanuele Parisi, F.B. (2021). Making the Most of Scarce Input Data in Deep Learning-based Source Code Classification for Heterogeneous Device Mapping. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 41(6), 1-12 [10.1109/TCAD.2021.3114617].
			
	Tutti gli autori
	
						Emanuele Parisi, Francesco Barchi, Andrea Bartolini, Andrea Acquaviva
					
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
making the most post print.pdf accesso aperto Tipo: Postprint Licenza: Licenza per accesso libero gratuito Dimensione 1.83 MB Formato Adobe PDF Visualizza/Apri	1.83 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/862449

Citazioni

ND

5

2

social impact