Transformer-based pretrained language models (PLMs) face scalability issues due to their computational expense, which increases with the length of the input sequence, and often struggle to maintain focus on relevant information. To mitigate this, we introduce PrunePert, a novel model featuring a learnable mechanism that identifies and removes uninformative tokens from the context. By doing so, our method not only addresses performance concerns but also enhances interpretability, offering valuable insights into the tokens utilized in the model's decision-making process. Specifically, our approach employs a differentiable perturbed top-k token selection module within the transformer layers to prune a user-defined percentage of tokens. It can be integrated with any downstream PLMs, allowing them to be trained end-to-end using backpropagation. We demonstrate the application of PrunePert in text summarization and classification tasks, utilizing both encoder-decoder PLMs and contemporary decoder-only large language models. Notably, our findings reveal that models equipped with PrunePert achieve up to 2x higher throughput and exhibit comparable performance in text summarization, while demonstrating superior performance in text classification tasks. Code is available at https://github.com/disi-unibo-nlp/prunepert.

Italiani, P., Ragazzi, L., Moro, G. (2025). Read Between the Tokens: Differentiable Text Pruning via Perturbed Top-k Selection. IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, 33, 4870-4879 [10.1109/TASLPRO.2025.3629289].

Read Between the Tokens: Differentiable Text Pruning via Perturbed Top-k Selection

Paolo Italiani
Co-primo
;
Luca Ragazzi
Co-primo
;
Gianluca Moro
Co-primo
2025

Abstract

Transformer-based pretrained language models (PLMs) face scalability issues due to their computational expense, which increases with the length of the input sequence, and often struggle to maintain focus on relevant information. To mitigate this, we introduce PrunePert, a novel model featuring a learnable mechanism that identifies and removes uninformative tokens from the context. By doing so, our method not only addresses performance concerns but also enhances interpretability, offering valuable insights into the tokens utilized in the model's decision-making process. Specifically, our approach employs a differentiable perturbed top-k token selection module within the transformer layers to prune a user-defined percentage of tokens. It can be integrated with any downstream PLMs, allowing them to be trained end-to-end using backpropagation. We demonstrate the application of PrunePert in text summarization and classification tasks, utilizing both encoder-decoder PLMs and contemporary decoder-only large language models. Notably, our findings reveal that models equipped with PrunePert achieve up to 2x higher throughput and exhibit comparable performance in text summarization, while demonstrating superior performance in text classification tasks. Code is available at https://github.com/disi-unibo-nlp/prunepert.
2025
Italiani, P., Ragazzi, L., Moro, G. (2025). Read Between the Tokens: Differentiable Text Pruning via Perturbed Top-k Selection. IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, 33, 4870-4879 [10.1109/TASLPRO.2025.3629289].
Italiani, Paolo; Ragazzi, Luca; Moro, Gianluca
File in questo prodotto:
File Dimensione Formato  
Read_Between_the_Tokens_Differentiable_Text_Pruning_via_Perturbed_Top-k_Selection.pdf

accesso aperto

Tipo: Versione (PDF) editoriale / Version Of Record
Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY)
Dimensione 1.89 MB
Formato Adobe PDF
1.89 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1027359
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? 0
social impact