Automatic pattern detection has become increasingly important for scholars in the humanities as the number of manuscripts that have been digitised has grown. Most of the state-of-the-art methods used for pattern detection depend on the availability of a large number of training samples, which are typically not available in the humanities as they involve tedious manual annotation by researchers (e.g. marking the location and size of words, drawings, seals and so on). This makes the applicability of such methods very limited within the field of manuscript research. We propose a learning-free approach based on a state-of-the-art Naive Bayes Nearest-Neighbour classifier for the task of pattern detection in manuscript images. The method has already been successfully applied to an actual research question from South Asian studies about palm-leaf manuscripts. Furthermore, state-of-the-art results have been achieved on two extremely challenging datasets, namely the AMADI_LontarSet dataset of handwriting on palm leaves for word-spotting and the DocExplore dataset of medieval manuscripts for pattern detection. A performance analysis is provided as well in order to facilitate later comparisons by other researchers. Finally, an easy-to-use implementation of the proposed method is developed as a software tool and made freely available.

Mohammed, H., Märgner, V., Ciotti, G. (2021). Learning-free pattern detection for manuscript research: An efficient approach toward making manuscript images searchable. INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 24(3), 167-179 [10.1007/s10032-021-00371-7].

Learning-free pattern detection for manuscript research: An efficient approach toward making manuscript images searchable

Ciotti, G
Ultimo
2021

Abstract

Automatic pattern detection has become increasingly important for scholars in the humanities as the number of manuscripts that have been digitised has grown. Most of the state-of-the-art methods used for pattern detection depend on the availability of a large number of training samples, which are typically not available in the humanities as they involve tedious manual annotation by researchers (e.g. marking the location and size of words, drawings, seals and so on). This makes the applicability of such methods very limited within the field of manuscript research. We propose a learning-free approach based on a state-of-the-art Naive Bayes Nearest-Neighbour classifier for the task of pattern detection in manuscript images. The method has already been successfully applied to an actual research question from South Asian studies about palm-leaf manuscripts. Furthermore, state-of-the-art results have been achieved on two extremely challenging datasets, namely the AMADI_LontarSet dataset of handwriting on palm leaves for word-spotting and the DocExplore dataset of medieval manuscripts for pattern detection. A performance analysis is provided as well in order to facilitate later comparisons by other researchers. Finally, an easy-to-use implementation of the proposed method is developed as a software tool and made freely available.
2021
Mohammed, H., Märgner, V., Ciotti, G. (2021). Learning-free pattern detection for manuscript research: An efficient approach toward making manuscript images searchable. INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 24(3), 167-179 [10.1007/s10032-021-00371-7].
Mohammed, H; Märgner, V; Ciotti, G
File in questo prodotto:
File Dimensione Formato  
Learning-free pattern detection for manuscript research.pdf

accesso aperto

Tipo: Versione (PDF) editoriale
Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY)
Dimensione 2.23 MB
Formato Adobe PDF
2.23 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/945319
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 2
social impact