This paper describes work directed towards the development of a syllable prominence-based prosody generation functionality for the German unit selection speech synthesis system. A general concept for syllable-prominence based prosody generation in unit selection synthesis is proposed. As a first step towards its implementation, an automated syllable prominence annotation procedure based on acoustic analyses has been performed on the BOSS speech corpus. The prominence labeling has been evaluated against an existing annotation of lexical stress levels and manual prominence labeling on a subset of the corpus. We discuss methods and results and give an outlook on further implementation steps.exploiting the inherent variation within the speech data. For example, a specific predicted pitch target for a certain position within an utterance to be synthesized might not be available in the speech corpus. In a prominence-based framework, this could be made up for by selecting a unit that exhibits a high values for another prosodic parameter at this position, resulting in the same abstract level of perceptual salience that the pitch target would have assigned. The prominence-based approach could thus be an efficient way of prosody modeling, although it has to be acknowledged that there certainly are contexts in which a specific pitch profile is actually crucial. A possible architecture for syllable-prominence-based
Windmann A., Wagner P., Tamburini F., Arnold D., Oertel C. (2010). Automatic Prominence Annotation for German Speech Syntesis Corpus: Towards Prominence-Based Prosody Generator for Unit Selection Syntesis. KYOTO : s.n.
Automatic Prominence Annotation for German Speech Syntesis Corpus: Towards Prominence-Based Prosody Generator for Unit Selection Syntesis
TAMBURINI, FABIO;
2010
Abstract
This paper describes work directed towards the development of a syllable prominence-based prosody generation functionality for the German unit selection speech synthesis system. A general concept for syllable-prominence based prosody generation in unit selection synthesis is proposed. As a first step towards its implementation, an automated syllable prominence annotation procedure based on acoustic analyses has been performed on the BOSS speech corpus. The prominence labeling has been evaluated against an existing annotation of lexical stress levels and manual prominence labeling on a subset of the corpus. We discuss methods and results and give an outlook on further implementation steps.exploiting the inherent variation within the speech data. For example, a specific predicted pitch target for a certain position within an utterance to be synthesized might not be available in the speech corpus. In a prominence-based framework, this could be made up for by selecting a unit that exhibits a high values for another prosodic parameter at this position, resulting in the same abstract level of perceptual salience that the pitch target would have assigned. The prominence-based approach could thus be an efficient way of prosody modeling, although it has to be acknowledged that there certainly are contexts in which a specific pitch profile is actually crucial. A possible architecture for syllable-prominence-basedI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.