We present a detailed description of an algorithm tailored to detect external plagiarism in PAN-09 competition. The algorithm is divided into three steps: a first reduction of the size of the problem by a selection of ten suspicious plagiarists using a n-gram distance on properly recoded texts. A search for matches after T9-like recoding. A “joining algorithm” that merges selected matches and is able to detect obfuscated plagiarism. The results are briefly discussed. Keywords: n-grams, plagiarism, coding, string matching
C. Basile, D. Benedetto, E. Caglioti, G. Cristadoro, M. Degli Esposti (2009). A plagiarism detection procedure in three steps: selection, matches and ”squares”. SINE LOCO : sine nomine.
A plagiarism detection procedure in three steps: selection, matches and ”squares”
CRISTADORO, GIAMPAOLO;DEGLI ESPOSTI, MIRKO
2009
Abstract
We present a detailed description of an algorithm tailored to detect external plagiarism in PAN-09 competition. The algorithm is divided into three steps: a first reduction of the size of the problem by a selection of ten suspicious plagiarists using a n-gram distance on properly recoded texts. A search for matches after T9-like recoding. A “joining algorithm” that merges selected matches and is able to detect obfuscated plagiarism. The results are briefly discussed. Keywords: n-grams, plagiarism, coding, string matchingI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.