We present a detailed description of an algorithm tailored to detect external plagiarism in PAN-09 competition. The algorithm is divided into three steps: a first reduction of the size of the problem by a selection of ten suspicious plagiarists using a n-gram distance on properly recoded texts. A search for matches after T9-like recoding. A “joining algorithm” that merges selected matches and is able to detect obfuscated plagiarism. The results are briefly discussed. Keywords: n-grams, plagiarism, coding, string matching
A plagiarism detection procedure in three steps: selection, matches and ”squares”
CRISTADORO, GIAMPAOLO;DEGLI ESPOSTI, MIRKO
2009
Abstract
We present a detailed description of an algorithm tailored to detect external plagiarism in PAN-09 competition. The algorithm is divided into three steps: a first reduction of the size of the problem by a selection of ten suspicious plagiarists using a n-gram distance on properly recoded texts. A search for matches after T9-like recoding. A “joining algorithm” that merges selected matches and is able to detect obfuscated plagiarism. The results are briefly discussed. Keywords: n-grams, plagiarism, coding, string matchingFile in questo prodotto:
Eventuali allegati, non sono esposti
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.