Mining categorical sequences from data using a hybrid clustering
method

De Angelis, Luca; Dias, José G.

doi:10.1016/j.ejor.2013.11.002

The identification of different dynamics in sequential data has become an every day need in scientific fields such as marketing, bioinformatics, finance, or social sciences. Contrary to cross-sectional or static data, this type of observations (also known as stream data, temporal data, longitudinal data or repeated measures) are more challenging as one has to incorporate data dependency in the clustering process. In this research we focus on clustering categorical sequences. The method proposed here combines modelbased and heuristic clustering. In the first step, the categorical sequences are transformed by an extension of the hidden Markov model into a probabilistic space, where a symmetric Kullback–Leibler distance can operate. Then, in the second step, using hierarchical clustering on the matrix of distances, the sequences can be clustered. This paper illustrates the enormous potential of this type of hybrid approach using a synthetic data set as well as the well-known Microsoft dataset with website users search patterns and a survey on job career dynamics.

Luca De Angelis, José G. Dias (2014). Mining categorical sequences from data using a hybrid clustering method. EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 234(3), 720-730 [10.1016/j.ejor.2013.11.002].

Mining categorical sequences from data using a hybrid clustering method

DE ANGELIS, LUCA;José G. Dias

2014

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2014
			
	Rivista
	
				EUROPEAN JOURNAL OF OPERATIONAL RESEARCH
			
	Codice DOI
	
				https://dx.doi.org/10.1016/j.ejor.2013.11.002
			
	Citazione
	
				Luca De Angelis,  José G. Dias (2014). Mining categorical sequences from data using a hybrid clustering
method. EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 234(3), 720-730 [10.1016/j.ejor.2013.11.002].
			
	Tutti gli autori
	
						Luca De Angelis; José G. Dias
					
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/226075

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

CRIS Current Research Information System

Mining categorical sequences from data using a hybrid clustering method

DE ANGELIS, LUCA;José G. Dias

2014

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Attenzione

Citazioni

social impact

CRIS Current Research Information System

Mining categorical sequences from data using a hybrid clustering method

DE ANGELIS, LUCA;José G. Dias

2014

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Attenzione

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)