Internet has made available huge amounts of information, also source code. Source code repositories and, in general, programming related websites, facilitate its reuse. In this work, we propose a simple approach to the detection of cross-language source code reuse, a nearly investigated problem. Our preliminary experiments, based on character n-grams comparison, show that considering different sections of the code (i.e., comments, code, reserved words, etc.), leads to different results. When considering three programming languages: C++, Java, and Python, the best result is obtained when comments are discarded and the entire source code is considered. © 2011 Springer-Verlag.
Towards the detection of cross-language source code reuse / Flores E.; Barron-Cedeno A.; Rosso P.; Moreno L.. - ELETTRONICO. - 6716:(2011), pp. 250-253. (Intervento presentato al convegno 16th International Conference on Applications of Natural Language to Information Systems, NLDB 2011 tenutosi a Alicante, esp nel 2011) [10.1007/978-3-642-22327-3_31].
Towards the detection of cross-language source code reuse
Barron-Cedeno A.;
2011
Abstract
Internet has made available huge amounts of information, also source code. Source code repositories and, in general, programming related websites, facilitate its reuse. In this work, we propose a simple approach to the detection of cross-language source code reuse, a nearly investigated problem. Our preliminary experiments, based on character n-grams comparison, show that considering different sections of the code (i.e., comments, code, reserved words, etc.), leads to different results. When considering three programming languages: C++, Java, and Python, the best result is obtained when comments are discarded and the entire source code is considered. © 2011 Springer-Verlag.File | Dimensione | Formato | |
---|---|---|---|
Barron Cedeno.pdf
accesso aperto
Tipo:
Postprint
Licenza:
Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY)
Dimensione
263.41 kB
Formato
Adobe PDF
|
263.41 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.