CRIS Current Research Information System

SpeechBrain is an open-source Conversational AI toolkit based on PyTorch, focused particularly on speech processing tasks such as speech recognition, speech enhancement, speaker recognition, text-to-speech, and much more. It promotes transparency and replicability by releasing both the pre-trained models and the complete “recipes” of code and algorithms required for training them. This paper presents SpeechBrain 1.0, a significant milestone in the evolution of the toolkit, which now has over 200 recipes for speech, audio, and language processing tasks, and more than 100 models available on Hugging Face. SpeechBrain 1.0 introduces new technologies to support diverse learning modalities, Large Language Model (LLM) integration, and advanced decoding strategies, along with novel models, tasks, and modalities. It also includes a new benchmark repository, offering researchers a unified platform for evaluating models across diverse tasks.

Ravanelli, M., Parcollet, T., Moumen, A., De Langen, S., Subakan, C., Plantinga, P., et al. (2024). Open-Source Conversational AI with SpeechBrain 1.0. JOURNAL OF MACHINE LEARNING RESEARCH, 25, 1-11.

Open-Source Conversational AI with SpeechBrain 1.0

Ravanelli M.;Parcollet T.;Moumen A.;de Langen S.;Subakan C.;Plantinga P.;Wang Y.;Mousavi P.;Libera L. D.;Ploujnikov A.;Paissan F.;Borra D.;Zaiem S.;Zhao Z.;Zhang S.;Karakasidis G.;Yeh S. -L.;Champion P.;Rouhe A.;Braun R.;Mai F.;Zuluaga-Gomez J.;Mousavi S. M.;Nautsch A.;Nguyen H.;Liu X.;Sagar S.;Duret J.;Mdhaffar S.;Laperriere G.;Rouvier M.;De Mori R.;Esteve Y.

2024

Abstract

SpeechBrain is an open-source Conversational AI toolkit based on PyTorch, focused particularly on speech processing tasks such as speech recognition, speech enhancement, speaker recognition, text-to-speech, and much more. It promotes transparency and replicability by releasing both the pre-trained models and the complete “recipes” of code and algorithms required for training them. This paper presents SpeechBrain 1.0, a significant milestone in the evolution of the toolkit, which now has over 200 recipes for speech, audio, and language processing tasks, and more than 100 models available on Hugging Face. SpeechBrain 1.0 introduces new technologies to support diverse learning modalities, Large Language Model (LLM) integration, and advanced decoding strategies, along with novel models, tasks, and modalities. It also includes a new benchmark repository, offering researchers a unified platform for evaluating models across diverse tasks.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2024
			
	Rivista
	
				JOURNAL OF MACHINE LEARNING RESEARCH
			
	Citazione
	
				Ravanelli, M., Parcollet, T., Moumen, A., De Langen, S., Subakan, C., Plantinga, P., et al. (2024). Open-Source Conversational AI with SpeechBrain 1.0. JOURNAL OF MACHINE LEARNING RESEARCH, 25, 1-11.
			
	Tutti gli autori
	
						Ravanelli, M.; Parcollet, T.; Moumen, A.; De Langen, S.; Subakan, C.; Plantinga, P.; Wang, Y.; Mousavi, P.; Libera, L. D.; Ploujnikov, A.; Paissan, F....espandi

File in questo prodotto:

Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1049178

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

43

ND

ND

social impact