Understanding the shopping motivations behind market baskets has significant commercial value for the grocery retail industry. The analysis of shopping transactions demands techniques that can cope with the volume and dimensionality of grocery transactional data while delivering interpretable outcomes. Latent Dirichlet allocation (LDA) allows processing grocery transactions and the discovering of customer behaviours. Interpretations of topic models typically exploit individual samples overlooking the uncertainty of single topics. Moreover, training LDA multiple times show topics with large uncertainty, that is, topics (dis)appear in some but not all posterior samples, concurring with various authors in the field. In response, we introduce a clustering methodology that post-processes posterior LDA draws to summarise topic distributions represented as recurrent topics. Our approach identifies clusters of topics that belong to different samples and provides associated measures of uncertainty for each group. Our proposed methodology allows the identification of an unconstrained number of customer behaviours presented as recurrent topics. We also establish a more holistic framework for model evaluation, which assesses topic models based not only on their predictive likelihood but also on quality aspects such as coherence and distinctiveness of single topics and credibility of a set of topics. Using the outcomes of a tailored survey, we set thresholds that aid in interpreting quality aspects in grocery retail data. We demonstrate that selecting recurrent topics not only improves predictive likelihood but also outperforms interpretability and credibility. We illustrate our methods with an example from a large British supermarket chain.

Posterior summaries of grocery retail topic models: Evaluation, interpretability and credibility / Mariflor Vega Carrasco; Ioanna Manolopoulou; Jason O'Sullivan; Rosie Prior; Mirco Musolesi. - In: JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS. - ISSN 0035-9254. - ELETTRONICO. - 71:3(2022), pp. 562-588. [10.1111/rssc.12546]

Posterior summaries of grocery retail topic models: Evaluation, interpretability and credibility

Mirco Musolesi
2022

Abstract

Understanding the shopping motivations behind market baskets has significant commercial value for the grocery retail industry. The analysis of shopping transactions demands techniques that can cope with the volume and dimensionality of grocery transactional data while delivering interpretable outcomes. Latent Dirichlet allocation (LDA) allows processing grocery transactions and the discovering of customer behaviours. Interpretations of topic models typically exploit individual samples overlooking the uncertainty of single topics. Moreover, training LDA multiple times show topics with large uncertainty, that is, topics (dis)appear in some but not all posterior samples, concurring with various authors in the field. In response, we introduce a clustering methodology that post-processes posterior LDA draws to summarise topic distributions represented as recurrent topics. Our approach identifies clusters of topics that belong to different samples and provides associated measures of uncertainty for each group. Our proposed methodology allows the identification of an unconstrained number of customer behaviours presented as recurrent topics. We also establish a more holistic framework for model evaluation, which assesses topic models based not only on their predictive likelihood but also on quality aspects such as coherence and distinctiveness of single topics and credibility of a set of topics. Using the outcomes of a tailored survey, we set thresholds that aid in interpreting quality aspects in grocery retail data. We demonstrate that selecting recurrent topics not only improves predictive likelihood but also outperforms interpretability and credibility. We illustrate our methods with an example from a large British supermarket chain.
2022
Posterior summaries of grocery retail topic models: Evaluation, interpretability and credibility / Mariflor Vega Carrasco; Ioanna Manolopoulou; Jason O'Sullivan; Rosie Prior; Mirco Musolesi. - In: JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS. - ISSN 0035-9254. - ELETTRONICO. - 71:3(2022), pp. 562-588. [10.1111/rssc.12546]
Mariflor Vega Carrasco; Ioanna Manolopoulou; Jason O'Sullivan; Rosie Prior; Mirco Musolesi
File in questo prodotto:
File Dimensione Formato  
jrsssc_71_3_562.pdf

accesso aperto

Tipo: Versione (PDF) editoriale
Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY)
Dimensione 4.23 MB
Formato Adobe PDF
4.23 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/904470
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 2
social impact