Diverse Transformer models are being explored for Contextual and Generative AI. While matrix-matrix and matrixvector multiplication remain invariant core operators, non-linear activation functions change and evolve very rapidly, emerging as a new bottleneck - especially as matrix-matrix and matrixvector multiplication are increasingly accelerated. This necessitates hardware flexibility to accelerate diverse non-linear operators, as software emulation is not sufficiently fast and efficient. We present PACE an open-source Polynomial Approximation Compute Engine that accelerates non-linear functions using optimal Piecewise Polynomial Approximation (PwPA) with arbitrary degrees and partitions, achieving <0.2 % accuracy loss on state-ofthe-art (SoA) pretrained Convolutional Neural Networks (CNNs), Vision Transformers (ViTs), and Large Language Models (LLMs) without fine-tuning. Integrated as an ISA extension into a RISCV processing cluster, PACE incurs 14% area overhead of the cluster while delivering up to 1 FP32 PolyEval per cycle per core at 0.95 GHz with an efficiency of 8 pJ/PolyEval. The PACEaugmented cluster achieves a 750× speedup over the C math.h library and 65× over software-based PwPA. PACE also delivers the highest throughput (1.5×) compared to existing generalpurpose approximation hardware.

Prasad, A.S., İslamoğlu, G., Bertaccini, L., Rossi, D., Conti, F., Benini, L. (2025). PACE: An Optimal Piecewise Polynomial Approximation Unit for Flexible and Efficient Transformer Non-linearity Acceleration. 345 E 47TH ST, NEW YORK, NY 10017 USA : IEEE Computer Society [10.1109/isvlsi65124.2025.11130197].

PACE: An Optimal Piecewise Polynomial Approximation Unit for Flexible and Efficient Transformer Non-linearity Acceleration

Bertaccini, Luca;Rossi, Davide;Conti, Francesco;Benini, Luca
2025

Abstract

Diverse Transformer models are being explored for Contextual and Generative AI. While matrix-matrix and matrixvector multiplication remain invariant core operators, non-linear activation functions change and evolve very rapidly, emerging as a new bottleneck - especially as matrix-matrix and matrixvector multiplication are increasingly accelerated. This necessitates hardware flexibility to accelerate diverse non-linear operators, as software emulation is not sufficiently fast and efficient. We present PACE an open-source Polynomial Approximation Compute Engine that accelerates non-linear functions using optimal Piecewise Polynomial Approximation (PwPA) with arbitrary degrees and partitions, achieving <0.2 % accuracy loss on state-ofthe-art (SoA) pretrained Convolutional Neural Networks (CNNs), Vision Transformers (ViTs), and Large Language Models (LLMs) without fine-tuning. Integrated as an ISA extension into a RISCV processing cluster, PACE incurs 14% area overhead of the cluster while delivering up to 1 FP32 PolyEval per cycle per core at 0.95 GHz with an efficiency of 8 pJ/PolyEval. The PACEaugmented cluster achieves a 750× speedup over the C math.h library and 65× over software-based PwPA. PACE also delivers the highest throughput (1.5×) compared to existing generalpurpose approximation hardware.
2025
Proceedings of IEEE Computer Society Annual Symposium on VLSI, ISVLSI
1
6
Prasad, A.S., İslamoğlu, G., Bertaccini, L., Rossi, D., Conti, F., Benini, L. (2025). PACE: An Optimal Piecewise Polynomial Approximation Unit for Flexible and Efficient Transformer Non-linearity Acceleration. 345 E 47TH ST, NEW YORK, NY 10017 USA : IEEE Computer Society [10.1109/isvlsi65124.2025.11130197].
Prasad, Arpan Suravi; İslamoğlu, Gamze; Bertaccini, Luca; Rossi, Davide; Conti, Francesco; Benini, Luca
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1040835
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 1
social impact