CRIS Current Research Information System

Plagiarism, the unacknowledged reuse of text, does not end at language boundaries. Cross-language plagiarism occurs if a text is translated from a fragment written in a different language and no proper citation is provided. Regardless of the change of language, the contents and, in particular, the ideas remain the same. Whereas different methods for the detection of monolingual plagiarism have been developed, less attention has been paid to the cross language case. In this paper we compare two recently proposed cross-language plagiarism detection methods (CL-CNG, based on character n-grams and CL-ASA, based on statistical translation), to a novel approach to this problem, based on machine translation and monolingual similarity analysis (T+MA). We explore the effectiveness of the three approaches for less related languages. CL-CNG shows not be appropriate for this kind of language pairs, whereas T+MA performs better than the previously proposed models.

Plagiarism detection across distant language pairs / Barron-Cedeno A.; Rosso P.; Agirre E.; Labaka G.. - ELETTRONICO. - 2:(2010), pp. 37-45. (Intervento presentato al convegno 23rd International Conference on Computational Linguistics, Coling 2010 tenutosi a Beijing, chn nel 2010).

Plagiarism detection across distant language pairs

Barron-Cedeno A.;Rosso P.;Agirre E.;Labaka G.

2010

Abstract

Plagiarism, the unacknowledged reuse of text, does not end at language boundaries. Cross-language plagiarism occurs if a text is translated from a fragment written in a different language and no proper citation is provided. Regardless of the change of language, the contents and, in particular, the ideas remain the same. Whereas different methods for the detection of monolingual plagiarism have been developed, less attention has been paid to the cross language case. In this paper we compare two recently proposed cross-language plagiarism detection methods (CL-CNG, based on character n-grams and CL-ASA, based on statistical translation), to a novel approach to this problem, based on machine translation and monolingual similarity analysis (T+MA). We explore the effectiveness of the three approaches for less related languages. CL-CNG shows not be appropriate for this kind of language pairs, whereas T+MA performs better than the previously proposed models.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
			2010
		
	Titolo del volume
	
			Coling 2010 - 23rd International Conference on Computational Linguistics, Proceedings of the Conference
		
	Pagina iniziale
	
			37
		
	Pagina finale
	
			45
		
	Citazione
	
			Plagiarism detection across distant language pairs / Barron-Cedeno A.; Rosso P.; Agirre E.; Labaka G.. - ELETTRONICO. - 2:(2010), pp. 37-45. (Intervento presentato al  convegno 23rd International Conference on Computational Linguistics, Coling 2010 tenutosi a Beijing, chn nel 2010).
		
	Tutti gli autori
	
			Barron-Cedeno A.; Rosso P.; Agirre E.; Labaka G.
		
	Appare nelle tipologie:
	
			4.01 Contributo in Atti di convegno

File in questo prodotto:

Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/709302

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

67

ND

social impact