To plagiarise is to robe credit of another person's work. Particularly, plagiarism in text means including text fragments (and even an entire document) from an author without giving him the correspondent credit. In this work we describe our first attempt to detect plagiarised segments in a text employing statistical Language Models (LMs) and perplexity. The preliminary experiments, carried out on two specialised and literary corpora (including original, part-of-speech and stemmed versions), show that perplexity of a text segment, given a Language Model calculated over an author text, is a relevant feature in plagiarism detection.

Towards the exploitation of statistical language models for plagiarism detection with reference / Barron-Cedeno A.; Rosso P.. - ELETTRONICO. - 377:(2008), pp. 15-19. (Intervento presentato al convegno 18th European Conference on Artificial Intelligence, ECAI 2008 - Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse, PAN 2008 tenutosi a Patras, grc nel 2008).

Towards the exploitation of statistical language models for plagiarism detection with reference

Barron-Cedeno A.;
2008

Abstract

To plagiarise is to robe credit of another person's work. Particularly, plagiarism in text means including text fragments (and even an entire document) from an author without giving him the correspondent credit. In this work we describe our first attempt to detect plagiarised segments in a text employing statistical Language Models (LMs) and perplexity. The preliminary experiments, carried out on two specialised and literary corpora (including original, part-of-speech and stemmed versions), show that perplexity of a text segment, given a Language Model calculated over an author text, is a relevant feature in plagiarism detection.
2008
CEUR Workshop Proceedings
15
19
Towards the exploitation of statistical language models for plagiarism detection with reference / Barron-Cedeno A.; Rosso P.. - ELETTRONICO. - 377:(2008), pp. 15-19. (Intervento presentato al convegno 18th European Conference on Artificial Intelligence, ECAI 2008 - Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse, PAN 2008 tenutosi a Patras, grc nel 2008).
Barron-Cedeno A.; Rosso P.
File in questo prodotto:
File Dimensione Formato  
paper2.pdf

accesso aperto

Tipo: Versione (PDF) editoriale
Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY)
Dimensione 157.58 kB
Formato Adobe PDF
157.58 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/709318
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact