CRIS Current Research Information System

We study how to find relevant questions in community forums when the language of the newquestions is different from that of the existing questions in the forum. In particular, we explore the Arabic-English language pair. We compare a kernel-based system with a feed-forward neural network in a scenario where a large parallel corpus is available for training a machine translation system, bilingual dictionaries, and cross-language word embeddings. We observe that both approaches degrade the performance of the system when working on the translated text, especially the kernel-based system, which depends heavily on a syntactic kernel. We address this issue using a cross-language tree kernel, which compares the original Arabic tree to the English trees of the related questions. We show that this kernel almost closes the performance gap with respect to the monolingual system. On the neural network side, we use the parallel corpus to train cross-language embeddings, which we then use to represent the Arabic input and the English related questions in the same space.The results also improve to close to those of the monolingual neural network. Overall, the kernel system shows a better performance compared to the neural network in all cases.

Da San Martino G., Romeo Salvatore, Barron-Cedeno A., Joty S., Marquez L., Moschitti A., et al. (2017). Cross-language question re-ranking. 1515 BROADWAY, NEW YORK, NY 10036-9998 USA : Association for Computing Machinery, Inc [10.1145/3077136.3080743].

Cross-language question re-ranking

Da San Martino G.;Romeo Salvatore;Barron-Cedeno A.;Joty S.;Marquez L.;Moschitti A.;Nakov P.

2017

Abstract

We study how to find relevant questions in community forums when the language of the newquestions is different from that of the existing questions in the forum. In particular, we explore the Arabic-English language pair. We compare a kernel-based system with a feed-forward neural network in a scenario where a large parallel corpus is available for training a machine translation system, bilingual dictionaries, and cross-language word embeddings. We observe that both approaches degrade the performance of the system when working on the translated text, especially the kernel-based system, which depends heavily on a syntactic kernel. We address this issue using a cross-language tree kernel, which compares the original Arabic tree to the English trees of the related questions. We show that this kernel almost closes the performance gap with respect to the monolingual system. On the neural network side, we use the parallel corpus to train cross-language embeddings, which we then use to represent the Arabic input and the English related questions in the same space.The results also improve to close to those of the monolingual neural network. Overall, the kernel system shows a better performance compared to the neural network in all cases.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2017
			
	Titolo del volume
	
				SIGIR 2017 - Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval
			
	Pagina iniziale
	
				1145
			
	Pagina finale
	
				1148
			
	Codice DOI
	
				https://dx.doi.org/10.1145/3077136.3080743
			
	Citazione
	
				Da San Martino G.,  Romeo Salvatore,  Barron-Cedeno A.,  Joty S.,  Marquez L.,  Moschitti A., et al. (2017). Cross-language question re-ranking. 1515 BROADWAY, NEW YORK, NY 10036-9998 USA : Association for Computing Machinery, Inc [10.1145/3077136.3080743].
			
	Tutti gli autori
	
						Da San Martino G.; Romeo Salvatore; Barron-Cedeno A.; Joty S.; Marquez L.; Moschitti A.; Nakov P.
					
	Appare nelle tipologie:
	
				4.01 Contributo in Atti di convegno

File in questo prodotto:

Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/709197

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

18

13

social impact