A novel method for unsupervised and supervised conversational message thread detection

Domeniconi, G.; Semertzidis, K.; Lopez, V.; Daly, E. M.; Kotoulas, S.; Moro, G.

doi:10.5220/0006001100430054

Efficiently detecting conversation threads from a pool of messages, such as social network chats, emails, comments to posts, news etc., is relevant for various applications, including Web Marketing, Information Retrieval and Digital Forensics. Existing approaches focus on text similarity using keywords as features that are strongly dependent on the dataset. Therefore, dealing with new corpora requires further costly analyses conducted by experts to find out new relevant features. This paper introduces a novel method to detect threads from any type of conversational texts overcoming the issue of previously determining specific features for each dataset. To automatically determine the relevant features of messages we map each message into a three dimensional representation based on its semantic content, the social interactions in terms of sender/recipients and its timestamp; then clustering is used to detect conversation threads. In addition, we propose a supervised approach to detect conversation threads that builds a classification model which combines the above extracted features for predicting whether a pair of messages belongs to the same thread or not. Our model harnesses the distance measure of a message to a cluster representing a thread to capture the probability that a message is part of that same thread. We present our experimental results on seven datasets, pertaining to different types of messages, and demonstrate the effectiveness of our method in the detection of conversation threads, clearly outperforming the state of the art and yielding an improvement of up to a 19%.

Domeniconi G., Semertzidis K., Lopez V., Daly E.M., Kotoulas S., Moro G. (2016). A novel method for unsupervised and supervised conversational message thread detection. AV D MANUELL, 27A 2 ESQ, SETUBAL, 2910-595, PORTUGAL : SciTePress [10.5220/0006001100430054].

A novel method for unsupervised and supervised conversational message thread detection

Domeniconi G.;Semertzidis K.;Lopez V.;Daly E. M.;Kotoulas S.;Moro G.

2016

Abstract

Efficiently detecting conversation threads from a pool of messages, such as social network chats, emails, comments to posts, news etc., is relevant for various applications, including Web Marketing, Information Retrieval and Digital Forensics. Existing approaches focus on text similarity using keywords as features that are strongly dependent on the dataset. Therefore, dealing with new corpora requires further costly analyses conducted by experts to find out new relevant features. This paper introduces a novel method to detect threads from any type of conversational texts overcoming the issue of previously determining specific features for each dataset. To automatically determine the relevant features of messages we map each message into a three dimensional representation based on its semantic content, the social interactions in terms of sender/recipients and its timestamp; then clustering is used to detect conversation threads. In addition, we propose a supervised approach to detect conversation threads that builds a classification model which combines the above extracted features for predicting whether a pair of messages belongs to the same thread or not. Our model harnesses the distance measure of a message to a cluster representing a thread to capture the probability that a message is part of that same thread. We present our experimental results on seven datasets, pertaining to different types of messages, and demonstrate the effectiveness of our method in the detection of conversation threads, clearly outperforming the state of the art and yielding an improvement of up to a 19%.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2016
			
	Titolo del volume
	
				DATA 2016 - Proceedings of the 5th International Conference on Data Management Technologies and Applications
			
	Pagina iniziale
	
				43
			
	Pagina finale
	
				54
			
	Codice DOI
	
				https://dx.doi.org/10.5220/0006001100430054
			
	Citazione
	
				Domeniconi G.,  Semertzidis K.,  Lopez V.,  Daly E.M.,  Kotoulas S.,  Moro G. (2016). A novel method for unsupervised and supervised conversational message thread detection. AV D MANUELL, 27A 2 ESQ, SETUBAL, 2910-595, PORTUGAL : SciTePress [10.5220/0006001100430054].
			
	Tutti gli autori
	
						Domeniconi G.; Semertzidis K.; Lopez V.; Daly E.M.; Kotoulas S.; Moro G.
					
	Appare nelle tipologie:
	
				4.01 Contributo in Atti di convegno

File in questo prodotto:

Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/778818

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

25

18

CRIS Current Research Information System