The advent of the Internet has caused an increase in content reuse, including source code. The purpose of this research is to uncover potential cases of source code reuse in large-scale environments. A good example is academia, where massive courses are taught to students who must demonstrate that they have acquired the knowledge. The need of detecting content reuse in quasi real-time encourages the development of automatic systems such as the one described in this paper for source code reuse detection. Our approach is based on the comparison of programs at character level. It is able to find potential cases of reuse across a huge number of assignments. It achieved better results than JPlag, the most used online system to find similarities among multiple sets of source codes. The most common obfuscation operations we found were changes in identifier names, comments and indentation.

Uncovering source code reuse in large-scale academic environments / Enrique Flores, Alberto Barrón-Cedeño, Lidia Moreno, Paolo Rosso. - In: COMPUTER APPLICATIONS IN ENGINEERING EDUCATION. - ISSN 1099-0542. - ELETTRONICO. - 23:3(2015), pp. 383-390. [10.1002/cae.21608]

Uncovering source code reuse in large-scale academic environments

Alberto Barrón-Cedeño;
2015

Abstract

The advent of the Internet has caused an increase in content reuse, including source code. The purpose of this research is to uncover potential cases of source code reuse in large-scale environments. A good example is academia, where massive courses are taught to students who must demonstrate that they have acquired the knowledge. The need of detecting content reuse in quasi real-time encourages the development of automatic systems such as the one described in this paper for source code reuse detection. Our approach is based on the comparison of programs at character level. It is able to find potential cases of reuse across a huge number of assignments. It achieved better results than JPlag, the most used online system to find similarities among multiple sets of source codes. The most common obfuscation operations we found were changes in identifier names, comments and indentation.
2015
Uncovering source code reuse in large-scale academic environments / Enrique Flores, Alberto Barrón-Cedeño, Lidia Moreno, Paolo Rosso. - In: COMPUTER APPLICATIONS IN ENGINEERING EDUCATION. - ISSN 1099-0542. - ELETTRONICO. - 23:3(2015), pp. 383-390. [10.1002/cae.21608]
Enrique Flores, Alberto Barrón-Cedeño, Lidia Moreno, Paolo Rosso
File in questo prodotto:
File Dimensione Formato  
Uncovering Source Code Reuse.pdf

Open Access dal 14/09/2015

Tipo: Postprint
Licenza: Licenza per accesso libero gratuito
Dimensione 235.5 kB
Formato Adobe PDF
235.5 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/707683
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 23
  • ???jsp.display-item.citation.isi??? 19
social impact