Frequent itemset (FI) mining aims at discovering relevant patterns from sets of transactions. In this work we focus on multi-level and multi-dimensional data, which provide a rich description of subjects through multiple features each at different levels of detail. Summarization of FIs has been only marginally addressed so far with specific reference to multi-level and multi-dimensional FIs. In this paper we fill this gap by proposing SUSHI, a framework for summarizing and visually exploring multi-level and multi-dimensional FIs. Specifically, SUSHI is based on (i) a similarity function for FIs which takes into account both their extensional (support-based) and intensional (feature-based) natures; (ii) theoretical results concerning antimonotonicity of support and similarity in multi-level settings, which allow us to propose an efficient clustering algorithm to generate hierarchical summaries; and (iii) two integrated approaches to summary visualization and exploration: a graph-based one, which highlights inter-cluster relationships, and a tree-based one, which emphasizes the relationships between the representative of each cluster and the other FIs in that cluster. SUSHI is evaluated using both a real and a synthetic dataset in terms of effectiveness, efficiency, and understandability of the summary, with reference to three different strategies for choosing cluster representatives. Overall, SUSHI shows to outperform previous approaches and to be a valuable tool to expedite the analysis of FIs. Besides, one of the three strategies for choosing cluster representatives shows to be the most effective one.

Summarization and Visualization of Multi-Level and Multi-Dimensional Itemsets

Matteo Francia;Matteo Golfarelli;Stefano Rizzi
2020

Abstract

Frequent itemset (FI) mining aims at discovering relevant patterns from sets of transactions. In this work we focus on multi-level and multi-dimensional data, which provide a rich description of subjects through multiple features each at different levels of detail. Summarization of FIs has been only marginally addressed so far with specific reference to multi-level and multi-dimensional FIs. In this paper we fill this gap by proposing SUSHI, a framework for summarizing and visually exploring multi-level and multi-dimensional FIs. Specifically, SUSHI is based on (i) a similarity function for FIs which takes into account both their extensional (support-based) and intensional (feature-based) natures; (ii) theoretical results concerning antimonotonicity of support and similarity in multi-level settings, which allow us to propose an efficient clustering algorithm to generate hierarchical summaries; and (iii) two integrated approaches to summary visualization and exploration: a graph-based one, which highlights inter-cluster relationships, and a tree-based one, which emphasizes the relationships between the representative of each cluster and the other FIs in that cluster. SUSHI is evaluated using both a real and a synthetic dataset in terms of effectiveness, efficiency, and understandability of the summary, with reference to three different strategies for choosing cluster representatives. Overall, SUSHI shows to outperform previous approaches and to be a valuable tool to expedite the analysis of FIs. Besides, one of the three strategies for choosing cluster representatives shows to be the most effective one.
Matteo Francia, Matteo Golfarelli, Stefano Rizzi
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11585/724973
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 2
social impact