Internet has made available huge amounts of information, also source code. Source code repositories and, in general, programming related websites, facilitate its reuse. In this work, we propose a simple approach to the detection of cross-language source code reuse, a nearly investigated problem. Our preliminary experiments, based on character n-grams comparison, show that considering different sections of the code (i.e., comments, code, reserved words, etc.), leads to different results. When considering three programming languages: C++, Java, and Python, the best result is obtained when comments are discarded and the entire source code is considered. © 2011 Springer-Verlag.

Towards the detection of cross-language source code reuse / Flores E.; Barron-Cedeno A.; Rosso P.; Moreno L.. - ELETTRONICO. - 6716:(2011), pp. 250-253. (Intervento presentato al convegno 16th International Conference on Applications of Natural Language to Information Systems, NLDB 2011 tenutosi a Alicante, esp nel 2011) [10.1007/978-3-642-22327-3_31].

Towards the detection of cross-language source code reuse

Barron-Cedeno A.;
2011

Abstract

Internet has made available huge amounts of information, also source code. Source code repositories and, in general, programming related websites, facilitate its reuse. In this work, we propose a simple approach to the detection of cross-language source code reuse, a nearly investigated problem. Our preliminary experiments, based on character n-grams comparison, show that considering different sections of the code (i.e., comments, code, reserved words, etc.), leads to different results. When considering three programming languages: C++, Java, and Python, the best result is obtained when comments are discarded and the entire source code is considered. © 2011 Springer-Verlag.
2011
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
250
253
Towards the detection of cross-language source code reuse / Flores E.; Barron-Cedeno A.; Rosso P.; Moreno L.. - ELETTRONICO. - 6716:(2011), pp. 250-253. (Intervento presentato al convegno 16th International Conference on Applications of Natural Language to Information Systems, NLDB 2011 tenutosi a Alicante, esp nel 2011) [10.1007/978-3-642-22327-3_31].
Flores E.; Barron-Cedeno A.; Rosso P.; Moreno L.
File in questo prodotto:
File Dimensione Formato  
Barron Cedeno.pdf

accesso aperto

Tipo: Postprint
Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY)
Dimensione 263.41 kB
Formato Adobe PDF
263.41 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/709277
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 27
  • ???jsp.display-item.citation.isi??? 15
social impact