Investigating the role of linguistic information in perception of voice cues
Speech perception is strongly dependent on perception of talkers’ voices. Through voices, we identify individual speakers, and in situations with multiple talkers (i.e., cocktail party situations), voices help us to discriminate between different speakers.
Talker voice perception is a major challenge for cochlear implant (CI) listeners due to the spectro-temporally degraded signal the CI delivers, caused by limitations related to electric stimulation. Limitations in voice perception may contribute to difficulties perceiving and understanding speech.
Talker voice perception relies upon vocal source and articulatory cues, which convey indexical information about the speaker, but linguistic information also influences perception of talkers’ voices. Acoustic-phonetic variability that affects phonetically relevant properties of speech impairs linguistic processing, and slower processing enables listeners to better attend to voice characteristics. Furthermore, perceptual vocal variability may be stored in linguistic memory representations, resulting in varying sensitivity to talker voice details depending on the type of linguistic information. Although the interaction of linguistic information and talker voice perception has been established, how acoustic-phonetic properties of speech and different linguistic information interact with perception of individual voice cues remains unknown.
The current research investigates the role of linguistic variability (e.g., words, nonwords) in the perception of individual vocal characteristics. Just-Noticeable-Differences (JNDs) were obtained for F0 and formant frequency distributions related to vocal tract length (VTL), presenting normal-hearing (NH) listeners with artificial F0 and VTL manipulations in a 3AFC paradigm. JNDs were compared for easy and hard words based on frequency and neighborhood density (NHD), and for easy and hard nonwords based on NHD and phonotactic probability. Furthermore, token identity was varied, presenting listeners with three equal or different stimuli.
Preliminary findings show that voice cue perception is affected by language-specific acoustic-phonetic details rather than top-down lexical characteristics, and suggest that voice cue perception in nonwords depends on phonotactics. This confirms the dependence of voice cue perception on acoustic-phonetic features in speech. Similar thresholds for easy and hard words were observed, strengthening the idea that linguistic variability affects voice cue perception only if this variability alters acoustic-phonetic details of speech. Better insight in the nature of talker voice perception may be a next step towards improvement of talker voice perception in CI listeners, which may eventually improve speech perception in this clinical group.