In the framework of speech enhancement, several parametric approaches based on an a priori model for a speech signal have been proposed. When using an autoregressive (AR) model, three issues must be addressed. 1) How to deal with AR parameter estimation? Indeed, due to additive noise, the standard least squares criterion leads to biased estimates of AR parameters. 2) Can an estimation of the variance of the additive noise for each speech frame be obtained? A voice activity detector is often used for its estimation. 3) Which estimation rules and techniques (filtering, smoothing, etc.) can be considered to retrieve the speech signal? Our contribution in this paper is threefold. First, we propose to view the identification of the noisy AR process as an errors-in-variables problem. This blind method has the advantage of providing accurate estimations of both the AR parameters and the variance of the additive noise. Second, we propose an alternative algorithm to standard Kalman smoothing, based on a constrained minimum variance estimation procedure with a lower computational cost. Third, the combination of these two steps is investigated. It provides better results than some existing speech enhancement approaches in terms of signal-to-noise-ratio (SNR), segmental SNR, and informal subjective tests.

Speech enhancement combining optimal smoothing and errors-in-variables identification of noisy AR processes

DIVERSI, ROBERTO;GUIDORZI, ROBERTO;SOVERINI, UMBERTO
2007

Abstract

In the framework of speech enhancement, several parametric approaches based on an a priori model for a speech signal have been proposed. When using an autoregressive (AR) model, three issues must be addressed. 1) How to deal with AR parameter estimation? Indeed, due to additive noise, the standard least squares criterion leads to biased estimates of AR parameters. 2) Can an estimation of the variance of the additive noise for each speech frame be obtained? A voice activity detector is often used for its estimation. 3) Which estimation rules and techniques (filtering, smoothing, etc.) can be considered to retrieve the speech signal? Our contribution in this paper is threefold. First, we propose to view the identification of the noisy AR process as an errors-in-variables problem. This blind method has the advantage of providing accurate estimations of both the AR parameters and the variance of the additive noise. Second, we propose an alternative algorithm to standard Kalman smoothing, based on a constrained minimum variance estimation procedure with a lower computational cost. Third, the combination of these two steps is investigated. It provides better results than some existing speech enhancement approaches in terms of signal-to-noise-ratio (SNR), segmental SNR, and informal subjective tests.
2007
W. Bobillet; R. Diversi; E. Grivel; R. Guidorzi; M. Najim; U. Soverini
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/51674
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 48
  • ???jsp.display-item.citation.isi??? 40
social impact