In the framework of speech enhancement, several parametric approaches based on an a priori model for a speech signal have been proposed. When using an autoregressive (AR) model, three issues must be addressed. 1) How to deal with AR parameter estimation? Indeed, due to additive noise, the standard least squares criterion leads to biased estimates of AR parameters. 2) Can an estimation of the variance of the additive noise for each speech frame be obtained? A voice activity detector is often used for its estimation. 3) Which estimation rules and techniques (filtering, smoothing, etc.) can be considered to retrieve the speech signal? Our contribution in this paper is threefold. First, we propose to view the identification of the noisy AR process as an errors-in-variables problem. This blind method has the advantage of providing accurate estimations of both the AR parameters and the variance of the additive noise. Second, we propose an alternative algorithm to standard Kalman smoothing, based on a constrained minimum variance estimation procedure with a lower computational cost. Third, the combination of these two steps is investigated. It provides better results than some existing speech enhancement approaches in terms of signal-to-noise-ratio (SNR), segmental SNR, and informal subjective tests.
W. Bobillet, R. Diversi, E. Grivel, R. Guidorzi, M. Najim, U. Soverini (2007). Speech enhancement combining optimal smoothing and errors-in-variables identification of noisy AR processes. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 55, 5564-5578 [10.1109/TSP.2007.898787].
Speech enhancement combining optimal smoothing and errors-in-variables identification of noisy AR processes
DIVERSI, ROBERTO;GUIDORZI, ROBERTO;SOVERINI, UMBERTO
2007
Abstract
In the framework of speech enhancement, several parametric approaches based on an a priori model for a speech signal have been proposed. When using an autoregressive (AR) model, three issues must be addressed. 1) How to deal with AR parameter estimation? Indeed, due to additive noise, the standard least squares criterion leads to biased estimates of AR parameters. 2) Can an estimation of the variance of the additive noise for each speech frame be obtained? A voice activity detector is often used for its estimation. 3) Which estimation rules and techniques (filtering, smoothing, etc.) can be considered to retrieve the speech signal? Our contribution in this paper is threefold. First, we propose to view the identification of the noisy AR process as an errors-in-variables problem. This blind method has the advantage of providing accurate estimations of both the AR parameters and the variance of the additive noise. Second, we propose an alternative algorithm to standard Kalman smoothing, based on a constrained minimum variance estimation procedure with a lower computational cost. Third, the combination of these two steps is investigated. It provides better results than some existing speech enhancement approaches in terms of signal-to-noise-ratio (SNR), segmental SNR, and informal subjective tests.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.