A better understanding of the emergent computation and problem-solving capabilities of recent large language models is of paramount importance to further improve them and broaden their applicability. This work investigates how a language model, trained to predict the next token, can perform arithmetic computations generalizing beyond training data. Binary addition and multiplication constitute a good testbed for this purpose, since they require a very small vocabulary and exhibit relevant input/output discontinuities making smooth input interpolation ineffective for novel data. We successfully trained a light language model to learn these tasks and ran a number of experiments to investigate the extrapolation capabilities and internal information processing. Our findings support the hypothesis that the language model works as an Encoding–Regression–Decoding machine where the computation takes place in the value space once the input token representation is mapped to an appropriate internal representation.

Maltoni, D., Ferrara, M. (2024). Arithmetic with language models: From memorization to computation. NEURAL NETWORKS, 179, 1-10 [10.1016/j.neunet.2024.106550].

Arithmetic with language models: From memorization to computation

Maltoni, Davide
Primo
;
Ferrara, Matteo
Secondo
2024

Abstract

A better understanding of the emergent computation and problem-solving capabilities of recent large language models is of paramount importance to further improve them and broaden their applicability. This work investigates how a language model, trained to predict the next token, can perform arithmetic computations generalizing beyond training data. Binary addition and multiplication constitute a good testbed for this purpose, since they require a very small vocabulary and exhibit relevant input/output discontinuities making smooth input interpolation ineffective for novel data. We successfully trained a light language model to learn these tasks and ran a number of experiments to investigate the extrapolation capabilities and internal information processing. Our findings support the hypothesis that the language model works as an Encoding–Regression–Decoding machine where the computation takes place in the value space once the input token representation is mapped to an appropriate internal representation.
2024
Maltoni, D., Ferrara, M. (2024). Arithmetic with language models: From memorization to computation. NEURAL NETWORKS, 179, 1-10 [10.1016/j.neunet.2024.106550].
Maltoni, Davide; Ferrara, Matteo
File in questo prodotto:
File Dimensione Formato  
1-s2.0-S089360802400474X-main.pdf

accesso aperto

Tipo: Versione (PDF) editoriale
Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY)
Dimensione 2.19 MB
Formato Adobe PDF
2.19 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/975900
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact