CRIS Current Research Information System

This paper presents a novel pipeline for constructing multimodal and multilingual parallel corpora, with a focus on evaluating state-of-the-art automatic speech recognition tools for verbatim transcription. The pipeline was developed during the process of updating the European Parliament Translation and Interpreting Corpus (EPTIC), leveraging recent NLP advancements to automate challenging tasks like multilingual alignment and speech recognition. Our findings indicate that current technologies can streamline corpus construction, with fine-tuning showing promising results in terms of transcription quality compared to out-of-the-box Whisper models. The lowest overall WER achieved for English was 0.180, using a fine-tuned Whisper-small model. As for Italian, the lowest WER (0.152) was obtained by the Whisper Large-v2 model, with the fine-tuned Whisper-small model still outperforming the baseline (0.201 vs. 0.219).

Fedotova, A., Ferraresi, A., Miličević Petrović, M., Barrón-Cedeño, A. (2024). Constructing a Multimodal, Multilingual Translation and Interpreting Corpus: A Modular Pipeline and an Evaluation of ASR for Verbatim Transcription.

Constructing a Multimodal, Multilingual Translation and Interpreting Corpus: A Modular Pipeline and an Evaluation of ASR for Verbatim Transcription

Fedotova, Alice;Ferraresi, Adriano;Miličević Petrović, Maja;Barrón-Cedeño, Alberto

2024

Abstract

This paper presents a novel pipeline for constructing multimodal and multilingual parallel corpora, with a focus on evaluating state-of-the-art automatic speech recognition tools for verbatim transcription. The pipeline was developed during the process of updating the European Parliament Translation and Interpreting Corpus (EPTIC), leveraging recent NLP advancements to automate challenging tasks like multilingual alignment and speech recognition. Our findings indicate that current technologies can streamline corpus construction, with fine-tuning showing promising results in terms of transcription quality compared to out-of-the-box Whisper models. The lowest overall WER achieved for English was 0.180, using a fine-tuned Whisper-small model. As for Italian, the lowest WER (0.152) was obtained by the Whisper Large-v2 model, with the fine-tuned Whisper-small model still outperforming the baseline (0.201 vs. 0.219).

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2024
			
	Titolo del volume
	
				Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)
			
	Pagina iniziale
	
				1
			
	Pagina finale
	
				7
			
	Collana/Serie
	
				CEUR WORKSHOP PROCEEDINGS
			
	Citazione
	
				Fedotova, A., Ferraresi, A., Miličević Petrović, M., Barrón-Cedeño, A. (2024). Constructing a Multimodal, Multilingual Translation and Interpreting Corpus: A Modular Pipeline and an Evaluation of ASR for Verbatim Transcription.
			
	Tutti gli autori
	
						Fedotova, Alice; Ferraresi, Adriano; Miličević Petrović, Maja; Barrón-Cedeño, Alberto

File in questo prodotto:

Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1001064

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

ND

ND

social impact