This work addresses the issue of cross-language high similarity and near-duplicates search, where, for the given document, a highly similar one is to be identified from a large cross-language collection of documents. We propose a concept-based similarity model for the problem which is very light in computation and memory. We evaluate the model on three corpora of different nature and two language pairs English-German and English-Spanish using the Eurovoc conceptual thesaurus. Our model is compared with two state-of-the-art models and we find, though the proposed model is very generic, it produces competitive results and is significantly stable and consistent across the corpora. © 2012 Springer-Verlag.

Gupta P., Barron-Cedeno A., Rosso P. (2012). Cross-language high similarity search using a conceptual thesaurus [10.1007/978-3-642-33247-0_8].

Cross-language high similarity search using a conceptual thesaurus

Barron-Cedeno A.;
2012

Abstract

This work addresses the issue of cross-language high similarity and near-duplicates search, where, for the given document, a highly similar one is to be identified from a large cross-language collection of documents. We propose a concept-based similarity model for the problem which is very light in computation and memory. We evaluate the model on three corpora of different nature and two language pairs English-German and English-Spanish using the Eurovoc conceptual thesaurus. Our model is compared with two state-of-the-art models and we find, though the proposed model is very generic, it produces competitive results and is significantly stable and consistent across the corpora. © 2012 Springer-Verlag.
2012
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
67
75
Gupta P., Barron-Cedeno A., Rosso P. (2012). Cross-language high similarity search using a conceptual thesaurus [10.1007/978-3-642-33247-0_8].
Gupta P.; Barron-Cedeno A.; Rosso P.
File in questo prodotto:
File Dimensione Formato  
BarronCedeno.pdf

accesso aperto

Tipo: Postprint
Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY)
Dimensione 182.89 kB
Formato Adobe PDF
182.89 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/709275
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 26
  • ???jsp.display-item.citation.isi??? ND
social impact