CRIS Current Research Information System

This work addresses the issue of cross-language high similarity and near-duplicates search, where, for the given document, a highly similar one is to be identified from a large cross-language collection of documents. We propose a concept-based similarity model for the problem which is very light in computation and memory. We evaluate the model on three corpora of different nature and two language pairs English-German and English-Spanish using the Eurovoc conceptual thesaurus. Our model is compared with two state-of-the-art models and we find, though the proposed model is very generic, it produces competitive results and is significantly stable and consistent across the corpora. © 2012 Springer-Verlag.

Gupta P., Barron-Cedeno A., Rosso P. (2012). Cross-language high similarity search using a conceptual thesaurus [10.1007/978-3-642-33247-0_8].

Cross-language high similarity search using a conceptual thesaurus

Gupta P.;Barron-Cedeno A.;Rosso P.

2012

Abstract

This work addresses the issue of cross-language high similarity and near-duplicates search, where, for the given document, a highly similar one is to be identified from a large cross-language collection of documents. We propose a concept-based similarity model for the problem which is very light in computation and memory. We evaluate the model on three corpora of different nature and two language pairs English-German and English-Spanish using the Eurovoc conceptual thesaurus. Our model is compared with two state-of-the-art models and we find, though the proposed model is very generic, it produces competitive results and is significantly stable and consistent across the corpora. © 2012 Springer-Verlag.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2012
			
	Titolo del volume
	
				Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
			
	Pagina iniziale
	
				67
			
	Pagina finale
	
				75
			
	Collana/Serie
	
				LECTURE NOTES IN ARTIFICIAL INTELLIGENCE
			
	Codice DOI
	
				https://dx.doi.org/10.1007/978-3-642-33247-0_8
			
	Citazione
	
				Gupta P.,  Barron-Cedeno A.,  Rosso P. (2012). Cross-language high similarity search using a conceptual thesaurus [10.1007/978-3-642-33247-0_8].
			
	Tutti gli autori
	
						Gupta P.; Barron-Cedeno A.; Rosso P.
					
	Appare nelle tipologie:
	
				4.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
BarronCedeno.pdf accesso aperto Tipo: Postprint / Author's Accepted Manuscript (AAM) - versione accettata per la pubblicazione dopo la peer-review Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY) Dimensione 182.89 kB Formato Adobe PDF Visualizza/Apri	182.89 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/709275

Citazioni

ND

27

ND

ND

social impact