CRIS Current Research Information System

Automatic plagiarism detection considering a reference corpus compares a suspicious text to a set of original documents in order to relate the plagiarised fragments to their potential source. Publications on this task often assume that the search space (the set of reference documents) is a narrow set where any search strategy will produce a good output in a short time. However, this is not always true. Reference corpora are often composed of a big set of original documents where a simple exhaustive search strategy becomes practically impossible. Before carrying out an exhaustive search, it is necessary to reduce the search space, represented by the documents in the reference corpus, as much as possible. Our experiments with the METER corpus show that a previous search space reduction stage, based on the Kullback- Leibler symmetric distance, reduces the search process time dramatically. Additionally, it improves the Precision and Recall obtained by a search strategy based on the exhaustive comparison of word n-grams. © Springer-Verlag Berlin Heidelberg 2009.

Barron-Cedeno, A., Rosso, P., Benedi, J.-M. (2009). Reducing the plagiarism detection search space on the basis of the Kullback-Leibler distance [10.1007/978-3-642-00382-0_42].

Reducing the plagiarism detection search space on the basis of the Kullback-Leibler distance

Barron-Cedeno A.;Rosso P.;Benedi J. -M.

2009

Abstract

Automatic plagiarism detection considering a reference corpus compares a suspicious text to a set of original documents in order to relate the plagiarised fragments to their potential source. Publications on this task often assume that the search space (the set of reference documents) is a narrow set where any search strategy will produce a good output in a short time. However, this is not always true. Reference corpora are often composed of a big set of original documents where a simple exhaustive search strategy becomes practically impossible. Before carrying out an exhaustive search, it is necessary to reduce the search space, represented by the documents in the reference corpus, as much as possible. Our experiments with the METER corpus show that a previous search space reduction stage, based on the Kullback- Leibler symmetric distance, reduces the search process time dramatically. Additionally, it improves the Precision and Recall obtained by a search strategy based on the exhaustive comparison of word n-grams. © Springer-Verlag Berlin Heidelberg 2009.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2009
			
	Titolo del volume
	
				Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
			
	Pagina iniziale
	
				523
			
	Pagina finale
	
				534
			
	Collana/Serie
	
				LECTURE NOTES IN ARTIFICIAL INTELLIGENCE
			
	Codice DOI
	
				https://dx.doi.org/10.1007/978-3-642-00382-0_42
			
	Citazione
	
				Barron-Cedeno, A., Rosso, P., Benedi, J.-M. (2009). Reducing the plagiarism detection search space on the basis of the Kullback-Leibler distance [10.1007/978-3-642-00382-0_42].
			
	Tutti gli autori
	
						Barron-Cedeno, A.; Rosso, P.; Benedi, J. -M.
					
	Appare nelle tipologie:
	
				4.01 Contributo in Atti di convegno

File in questo prodotto:

Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/709283

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

32

23

ND

social impact