Modeling durational patterns in connected discourse

grant to Caroline Smith from the National Science Foundation, May 2000 - April 2003

NSF information on  this grant

Project summary

The speech synthesis systems now available can produce speech whose segmental properties mimic human speech remarkably well. However, synthetic speech still often sounds unnatural, primarily because of the relatively less successful modeling of the prosodic properties of speech, such as intonation and timing. This project focuses on the prosodic properties of longer stretches of speech - sequences of sentences and paragraphs. The goal is to explore how prosody, specifically durations, is organized over larger units of speech.

The approach adopted here is to compare different languages, with known differences in timing at segmental, word or phrase level, to see how they differ in structuring timing over an entire discourse. The languages that will be compared are English, French and Japanese. Improved understanding of discourse-level timing will help to explain the relation between durational patterns and linguistic structure.

The project is organized into four phases: (1) a production study to investigate what factors influence durational patterns in longer stretches of speech; (2) modeling of the results of the production study; (3) analysis-synthesis incorporating the model of durations into synthesized speech; and (4) testing the resulting synthesis through evaluation by human listeners. The durational properties to be examined include amount of lengthening at sentence boundaries, duration of pauses, and changes in speech rate before and after a boundary. Comparisons among the three languages will identify the factors that are important in influencing durations, and help explain how these factors relate to the different linguistic structures of the languages.
 
 
 
 

Back to Home page