Machine learning (ML) is making its way into the source code analysis. Most of the time, this happens with the help of Natural Language Processing (NLP) techniques. However, NLP techniques often represent their input as a sequence of tokens. This assumption is reasonable when processing text because the words related to the same object usually follow each other. However, in source code, this assumption can be inadequate simply because of the source code execution nature. Graphs can be much more adequate for representing source code. They can capture the dependency structure of a program. Due to the recent advances in the area of machine learning on graphs, researchers started to explore the graph-based representation of software in the scope of machine learning applications. There is no single way to represent a program in the form of a graph. For this reason, researchers explored different alternatives, such as function call graphs (FCG), data flow graphs (DFG), control flow graphs (CFG), or their mixtures. In this survey, we overview approaches for representing software as graphs and how these representations help to solve machine learning tasks.
Romanov V, Ivanov V, Succi G (2020). Approaches for Representing Software as Graphs for Machine Learning Applications. IEEE [10.1109/ICS51289.2020.00109].
Approaches for Representing Software as Graphs for Machine Learning Applications
Succi G
2020
Abstract
Machine learning (ML) is making its way into the source code analysis. Most of the time, this happens with the help of Natural Language Processing (NLP) techniques. However, NLP techniques often represent their input as a sequence of tokens. This assumption is reasonable when processing text because the words related to the same object usually follow each other. However, in source code, this assumption can be inadequate simply because of the source code execution nature. Graphs can be much more adequate for representing source code. They can capture the dependency structure of a program. Due to the recent advances in the area of machine learning on graphs, researchers started to explore the graph-based representation of software in the scope of machine learning applications. There is no single way to represent a program in the form of a graph. For this reason, researchers explored different alternatives, such as function call graphs (FCG), data flow graphs (DFG), control flow graphs (CFG), or their mixtures. In this survey, we overview approaches for representing software as graphs and how these representations help to solve machine learning tasks.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.