In natural language generation, abstractive summarization (AS) is advancing rapidly due to transformer-based language models (LMs). Although decoding strategies significantly influence generated summaries, their significance is often overlooked. Given the abundance of token selection heuristics and associated hyperparameters, the community needs guidance to make well-informed decisions based on the specific task and target metrics. To address this gap, we conduct a comparative assessment of the effectiveness and efficiency of decoding-time techniques for short, long, and multi-document AS. We explore over 3,500 combinations involving three widely used million-scale autoregressive encoder-decoder LMs, two billion-scale decoder-only LMs, six datasets, and nine decoding settings. Our findings highlight that optimized decoding choices can lead to substantial performance improvements. Alongside human evaluation, we quantitatively measure effects using ten automatic metrics, covering dimensions such as semantic similarity, factuality, compression, redundancy, and carbon footprint. To set the stage for differentiable selection and optimization of decoding options, we introduce PRISM, a first-of-its-kind dataset that pairs AS gold input-output examples with our LM predictions across a diverse range of decoding options.
Frisoni, G., Ragazzi, L., Cohen, D., Moro, G., Carbonaro, A., Sartori, C. (2026). Abstractive Summarization through the Prism of Decoding Strategies. NEURAL NETWORKS, 195, 1-32 [10.1016/j.neunet.2025.108249].
Abstractive Summarization through the Prism of Decoding Strategies
Giacomo FrisoniCo-primo
;Luca RagazziCo-primo
;Gianluca MoroCo-primo
;Antonella CarbonaroCo-primo
;Claudio SartoriCo-primo
2026
Abstract
In natural language generation, abstractive summarization (AS) is advancing rapidly due to transformer-based language models (LMs). Although decoding strategies significantly influence generated summaries, their significance is often overlooked. Given the abundance of token selection heuristics and associated hyperparameters, the community needs guidance to make well-informed decisions based on the specific task and target metrics. To address this gap, we conduct a comparative assessment of the effectiveness and efficiency of decoding-time techniques for short, long, and multi-document AS. We explore over 3,500 combinations involving three widely used million-scale autoregressive encoder-decoder LMs, two billion-scale decoder-only LMs, six datasets, and nine decoding settings. Our findings highlight that optimized decoding choices can lead to substantial performance improvements. Alongside human evaluation, we quantitatively measure effects using ten automatic metrics, covering dimensions such as semantic similarity, factuality, compression, redundancy, and carbon footprint. To set the stage for differentiable selection and optimization of decoding options, we introduce PRISM, a first-of-its-kind dataset that pairs AS gold input-output examples with our LM predictions across a diverse range of decoding options.| File | Dimensione | Formato | |
|---|---|---|---|
|
1-s2.0-S089360802501130X-main.pdf
accesso aperto
Tipo:
Versione (PDF) editoriale / Version Of Record
Licenza:
Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY)
Dimensione
9.31 MB
Formato
Adobe PDF
|
9.31 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


