The browser you are using is not supported by this website. All versions of Internet Explorer are no longer supported, either by us or Microsoft (read more here: https://www.microsoft.com/en-us/microsoft-365/windows/end-of-ie-support).

Please use a modern browser to fully experience our website, such as the newest versions of Edge, Chrome, Firefox or Safari etc.

Perception, Analysis and Synthesis of Speaker Age

Author

Summary, in English

Speaker age is an important paralinguistic feature in speech which has to be considered in the study of phonetic variation. Knowledge about this feature may be used to improve speech technology applications, e.g. automatic speech recognition and speech synthesis. The present thesis describes six studies of several phonetic aspects of age-related variation in speech.



As the speech production mechanism changes from young adulthood to old age, speech is affected in numerous ways. Human perception of speaker age is based on cues such as pitch, speech rate and voice quality, and is fairly accurate. However, it is still unclear which cues are the most important ones. The first study included in this thesis investigated the role of F0 and speech rate (word duration) in age perception. It was found that while these cues may be less important than spectral ones (e.g. formant frequencies), they still correlate with chronological as well as perceived age.



In the second study, two stimulus types of various lengths were compared. Results indicated that while longer stimulus duration (regardless of speech type) seems to improve the age estimation of females, spontaneous speech (regardless of duration) appears to contain more important cues for perception of male speaker age.



In the next two studies, several automatic estimators of speaker age were built, none of which reached the same accuracy as humans. Important features in machine perception of age were also investigated. It was found that prosodic features seem to be more important in the estimation of female age, while spectral features (e.g. F2 ) appear to be more important for male age.



Although several acoustic correlates of speaker age are known, their relative importance has not yet been established. The next study analysed 161 features, automatically extracted from segments in six words produced by 527 speakers. Normalised means were used to ensure that the features could be compared directly. The most important acoustic correlates of speaker age were identified to be speech rate (segment duration) and intensity range. However, F0 and some spectral measures (e.g. F1 and F2 ) may also, if used in combination with other features, be important correlates of age.



Synthetic speech may sound more natural if speaker age is included as a parameter. The final study developed a research tool which used data- driven formant synthesis and age-weighted linear interpolation to simulate an age between the ages of any two of four female differently aged reference speakers. Evaluation of the tool showed that speaker age may in fact be simulated using formant synthesis. The tool will be used in further studies of analysis by synthesis of speaker age.

Department/s

Publishing year

2006

Language

English

Publication/Series

Travaux de l'Institut de Linguistique de Lund

Volume

47

Document type

Dissertation

Publisher

Linguistics and Phonetics

Topic

  • General Language Studies and Linguistics

Keywords

  • phonology
  • perceptual cues
  • speaker age
  • automatic speaker recognition
  • acoustic analysis
  • acoustic correlates
  • data-driven
  • Phonetics
  • formant synthesis
  • Fonetik
  • fonologi
  • Technological sciences
  • Teknik

Status

Published

Project

  • SweDia 2000

Supervisor

  • Per Lindblad

ISBN/ISSN/Other

  • ISSN: 0347-2558
  • ISBN: 91-974116-4-7

Defence date

2 December 2006

Defence time

13:15

Defence place

Hörsalen, Humanisthuset, Språk-och Litteraturcentrum, Helgonabacken 12, Lund

Opponent

  • Bernd Möbius (Associate Professor)