Purpose The aim of this work is to provide an overview of the current capabilities of Multimodal Large Language Models (MLLMs) for Handwritten Text Recognition (HTR), assessing their potential when compared to traditional task-specific, supervised models. Design/methodology/approach The approach is that of using a set of openly-available benchmarks to compare different LLMs with strong task-specific supervised baselines for the task of HTR. Findings The results show that LLMs currently show a strong performance on English texts, yet they demonstrate a weaker performance on languages other than English, and do not possess a significant capability for self-correction. Moreover, their comparison with Transkribus’s models highlight the fact that proprietary LLM models are the best performing, in particular on modern handwriting, while for historical documents the overall performance comparison between LLMs and Transkribus is not consistent. Originality/value The authors are not aware of a similar study relying on open benchmarks.

Crosilla, G., Klic, L., Colavizza, G. (2025). Benchmarking large language models for handwritten text recognition. JOURNAL OF DOCUMENTATION, 81(7), 334-354 [10.1108/JD-03-2025-0082].

Benchmarking large language models for handwritten text recognition

Crosilla G.
Primo
;
Colavizza G.
Ultimo
2025

Abstract

Purpose The aim of this work is to provide an overview of the current capabilities of Multimodal Large Language Models (MLLMs) for Handwritten Text Recognition (HTR), assessing their potential when compared to traditional task-specific, supervised models. Design/methodology/approach The approach is that of using a set of openly-available benchmarks to compare different LLMs with strong task-specific supervised baselines for the task of HTR. Findings The results show that LLMs currently show a strong performance on English texts, yet they demonstrate a weaker performance on languages other than English, and do not possess a significant capability for self-correction. Moreover, their comparison with Transkribus’s models highlight the fact that proprietary LLM models are the best performing, in particular on modern handwriting, while for historical documents the overall performance comparison between LLMs and Transkribus is not consistent. Originality/value The authors are not aware of a similar study relying on open benchmarks.
2025
Crosilla, G., Klic, L., Colavizza, G. (2025). Benchmarking large language models for handwritten text recognition. JOURNAL OF DOCUMENTATION, 81(7), 334-354 [10.1108/JD-03-2025-0082].
Crosilla, G.; Klic, L.; Colavizza, G.
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1032170
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact