The effect of acoustic and semantic cues on speech recognition in noise
Previous work has shown that listeners perform better in speech-in-noise tasks when the target speech has been produced clearly (e.g., Pichora-Fuller, Goy, & Van Lieshout, 2010) and when the speech signal contains sentence-level contextual information (e.g., Bradlow & Alexander, 2007; Smiljanic & Sladen, 2013). This benefit from clear speech modifications and semantic contextual cues has been shown to be modulated by several factors, such as the type and level of masking noise (e.g., Calandruccio et al., 2010; Payton et al., 1994) and listeners’ experience with the target language (e.g., Mayo, Florentine, & Buus, 1997). While most research has examined the effect of some of these factors, only few studies have directly compared the intelligibility benefit of semantic and acoustic cues and their interaction with different types and levels of noise maskers.
The first goal of the current study was to explore to what extent listeners benefit from acoustic-phonetic and semantic intelligibility-enhancing cues. The second goal was to explore how these acoustic and semantic cues interact with energetic and informational masking at different signal-to-noise ratios (SNR). In two experiments, native English listeners heard meaningful noise-adapted (NAS) and clear speech (CS) English sentences, mixed with either speech-shaped noise (SSN), two-talker (2T), or six-talker (6T) babble, and presented at -5dB or -7dB SNR. In experiment 1, listeners heard sentences in which the final word was predicted by the preceding words (high-predictability sentences). In experiment 2, a different group of listeners heard sentences in which the final word could not be predicted from the preceding words (low-predictability sentences).
Results from both experiments showed that listeners benefitted significantly from CS and NAS for all masker types. Intelligibility gain from NAS compared to speech in quiet was significantly larger than the benefit from CS compared to conversational speech, indicating that the acoustic cues from NAS may overall be more accessible. For both experiments, the two speaking style modifications increased intelligibility most in SSN and least in 2T babble. This shows that speaking style adaptations improve word recognition most under energetic masking (SSN) and are less beneficial in listening conditions with less energetic masking (2T babble) that resulted from larger spectro-temporal dips. Results also revealed that the intelligibility benefit from NAS and CS was greater for high-predictability than for low-predictability sentences in all masker types. This suggests that listeners may be better at utilizing acoustic cues for speech recognition in noise when semantic cues are available.