Although current state-of-the-art Transformer-based solutions succeeded in a wide range for single-document NLP tasks, they still struggle to address multi-input tasks such as multi-document summarization. Many solutions truncate the inputs, thus ignoring potential summary-relevant contents, which is unacceptable in the medical domain where each information can be vital. Others leverage linear model approximations to apply multi-input concatenation, worsening the results because all information is considered, even if it is conflicting or noisy with respect to a shared background. Despite the importance and social impact of medicine, there are no ad-hoc solutions for multi-document summarization. For this reason, we propose a novel discriminative marginalized probabilistic method (DAMEN) trained to discriminate critical information from a cluster of topic-related medical documents and generate a multi-document summary via token probability marginalization. Results prove we outperform the previous state-of-the-art on a biomedical dataset for multi-document summarization of systematic literature reviews. Moreover, we perform extensive ablation studies to motivate the design choices and prove the importance of each module of our method.

Gianluca Moro, Luca Ragazzi, Lorenzo Valgimigli, Davide Freddi (2022). Discriminative Marginalized Probabilistic Neural Method for Multi-Document Summarization of Medical Literature. Stroudsburg PA 18360 : The Association for Computational Linguistics [10.18653/v1/2022.acl-long.15].

Discriminative Marginalized Probabilistic Neural Method for Multi-Document Summarization of Medical Literature

Gianluca Moro;Luca Ragazzi;Lorenzo Valgimigli;
2022

Abstract

Although current state-of-the-art Transformer-based solutions succeeded in a wide range for single-document NLP tasks, they still struggle to address multi-input tasks such as multi-document summarization. Many solutions truncate the inputs, thus ignoring potential summary-relevant contents, which is unacceptable in the medical domain where each information can be vital. Others leverage linear model approximations to apply multi-input concatenation, worsening the results because all information is considered, even if it is conflicting or noisy with respect to a shared background. Despite the importance and social impact of medicine, there are no ad-hoc solutions for multi-document summarization. For this reason, we propose a novel discriminative marginalized probabilistic method (DAMEN) trained to discriminate critical information from a cluster of topic-related medical documents and generate a multi-document summary via token probability marginalization. Results prove we outperform the previous state-of-the-art on a biomedical dataset for multi-document summarization of systematic literature reviews. Moreover, we perform extensive ablation studies to motivate the design choices and prove the importance of each module of our method.
2022
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
180
189
Gianluca Moro, Luca Ragazzi, Lorenzo Valgimigli, Davide Freddi (2022). Discriminative Marginalized Probabilistic Neural Method for Multi-Document Summarization of Medical Literature. Stroudsburg PA 18360 : The Association for Computational Linguistics [10.18653/v1/2022.acl-long.15].
Gianluca Moro; Luca Ragazzi; Lorenzo Valgimigli; Davide Freddi
File in questo prodotto:
File Dimensione Formato  
2022.acl-long.15.pdf

accesso aperto

Tipo: Versione (PDF) editoriale
Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY)
Dimensione 663.61 kB
Formato Adobe PDF
663.61 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/900380
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 32
  • ???jsp.display-item.citation.isi??? 21
social impact