CRIS Current Research Information System

The automatic detection of plagiarism is a task that has acquired relevance in the Information Retrieval area and it becomes more complex when the plagiarism is made in a multilingual panorama, where the original and suspicious texts are written in different languages. From a cross-lingual perspective, a text fragment in one language is considered a plagiarism of a text in another language if their contents are considered semantically similar no matter they are written in different languages and the corresponding citation or credit is not included. Our current experiments on cross-lingual plagiarism analysis are based on the exploitation of a statistical bilingual dictionary. This dictionary is created on the basis of a parallel corpus which contains original fragments written in one language and plagiarised versions of these fragments written in another language. The process for the automatic plagiarism analysis based on the statistical bilingual dictionary has shown good results in the automatic cross-lingual plagiarism analysis and we consider that it could be useful for the cross-lingual near-duplicate detection task.

Barron-Cedeno A., Rosso P., Pinto D., Juan A. (2008). On cross-lingual plagiarism analysis using a statistical model.

On cross-lingual plagiarism analysis using a statistical model

Barron-Cedeno A.;Rosso P.;Pinto D.;Juan A.

2008

Abstract

The automatic detection of plagiarism is a task that has acquired relevance in the Information Retrieval area and it becomes more complex when the plagiarism is made in a multilingual panorama, where the original and suspicious texts are written in different languages. From a cross-lingual perspective, a text fragment in one language is considered a plagiarism of a text in another language if their contents are considered semantically similar no matter they are written in different languages and the corresponding citation or credit is not included. Our current experiments on cross-lingual plagiarism analysis are based on the exploitation of a statistical bilingual dictionary. This dictionary is created on the basis of a parallel corpus which contains original fragments written in one language and plagiarised versions of these fragments written in another language. The process for the automatic plagiarism analysis based on the statistical bilingual dictionary has shown good results in the automatic cross-lingual plagiarism analysis and we consider that it could be useful for the cross-lingual near-duplicate detection task.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2008
			
	Titolo del volume
	
				CEUR Workshop Proceedings
			
	Pagina iniziale
	
				9
			
	Pagina finale
	
				13
			
	Collana/Serie
	
				CEUR WORKSHOP PROCEEDINGS
			
	Citazione
	
				Barron-Cedeno A.,  Rosso P.,  Pinto D.,  Juan A. (2008). On cross-lingual plagiarism analysis using a statistical model.
			
	Tutti gli autori
	
						Barron-Cedeno A.; Rosso P.; Pinto D.; Juan A.
					
	Appare nelle tipologie:
	
				4.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
paper1.pdf accesso aperto Tipo: Versione (PDF) editoriale / Version Of Record Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY) Dimensione 131.49 kB Formato Adobe PDF Visualizza/Apri	131.49 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/709316

Citazioni

ND

44

ND

ND

social impact