SpiN 2018 :: program

How the tongue and lips produce clear speech: CVC words with randomised vowels, transcribed by listeners in normal and noisy conditions

James M Scobbie^(a), Joan Ma
Queen Margaret University, Scotland

(a) Presenting

In noisy conditions, speakers adapt their speech production in a number of ways, which can be usefully studied through acoustic analysis of the speech output or perceptual testing of this output. Noise can be perceptible to the speaker, the speaker and listener, or the listener alone, and in all cases, the speaker may enhance aspects of their speech for the benefit of the listener. Such changes cause a variety of effects such as an expanded vowel space. One hypothesis is that speakers alter their supralaryngeal articulations in order to enhance formant values, to make vowels more perceptibly distinct from each other.

Our focus is on the enhancement of speech production that occurs when the listener’s hearing becomes artificially masked by speech babble noise (40dB), as a model of how speech production may change when talking to hearing-impaired listeners.

Our previous pilot work on a single speaker found that tongue and lip articulations were indeed different in a noisy condition. The materials were single /b/+V+/p/ CVC words. They varied only in the vowel, which was one of six monophthongs which were fully randomised over 6 productions. Some vowels like /ʉ/ appeared to change little, and the high front vowels /i/ and /e/ were slightly retracted. Though the initial /b/ and final /p/ were identical every time, in the noisy condition the /b/ was clearly hyper-articulated while the /p/ was not. For this full paper we will analyse 6 speakers of Scottish English to look for general patterns.

We will use Ultrasound Tongue Imaging and a lip-jaw camera synchronised with the acoustics, in order to describe the ways in which different speakers expand their vowel space. Ultrasound provides a mid-sagittal tongue surface image from near the tip of the tongue down to near the root, and is cheap, quiet, and provides dynamic images at a high frame rate. The camera is mounted on a headset worn by the speaker, which also holds the ultrasound probe steady.

We will analyse the difference between the two production conditions by tracing the tongue surface in each vowel and calculating the average position of the target, and comparing differences in vowel shape and location across conditions. In addition, the Bark-transformed area of the F1/F2 acoustic vowel space will be measured.

Last modified 2017-11-17 15:56:08