We present a detailed description of an algorithm tailored to detect external plagiarism in PAN-09 competition. The algorithm is divided into three steps: a first reduction of the size of the problem by a selection of ten suspicious plagiarists using a n-gram distance on properly recoded texts. A search for matches after T9-like recoding. A “joining algorithm” that merges selected matches and is able to detect obfuscated plagiarism. The results are briefly discussed. Keywords: n-grams, plagiarism, coding, string matching
Titolo: | A plagiarism detection procedure in three steps: selection, matches and ”squares” |
Autore/i: | C. Basile; D. Benedetto; E. Caglioti; CRISTADORO, GIAMPAOLO; DEGLI ESPOSTI, MIRKO |
Autore/i Unibo: | |
Anno: | 2009 |
Titolo del libro: | Proceedings of the SEPLN'09 Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse |
Pagina iniziale: | 19 |
Pagina finale: | 24 |
Abstract: | We present a detailed description of an algorithm tailored to detect external plagiarism in PAN-09 competition. The algorithm is divided into three steps: a first reduction of the size of the problem by a selection of ten suspicious plagiarists using a n-gram distance on properly recoded texts. A search for matches after T9-like recoding. A “joining algorithm” that merges selected matches and is able to detect obfuscated plagiarism. The results are briefly discussed. Keywords: n-grams, plagiarism, coding, string matching |
Data prodotto definitivo in UGOV: | 14-dic-2009 |
Appare nelle tipologie: | 4.01 Contributo in Atti di convegno |
File in questo prodotto:
Eventuali allegati, non sono esposti
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.