‘Normal’ hearing thresholds and figure-ground perception explain significant variability in speech-in-noise performance
Speech-in-noise (SIN) perception is a critical everyday task that varies widely across individuals and cannot be explained fully by the pure-tone audiogram. One factor that likely contributes to difficulty understanding SIN is the ability to separate speech from simultaneously-occurring background sounds, which is likely not well assessed by audiometric thresholds. A basic task that assesses the ability to separate target and background sounds is auditory figure-ground perception. Here, we examined how much common variance links speech-in-noise perception to figure-ground perception, and how this relationship depends on the properties of the figure to be detected.
We recruited 96 participants with normal hearing (6-frequency average pure-tone thresholds < 20 dB HL). We presented sentences from the Oldenburg matrix corpus (e.g., "Alan has two old sofas") simultaneously with multi-talker babble noise. We adapted the target-to-masker ratio (TMR) to determine the participant's threshold for reporting 50% of sentences correctly. Our figure-ground stimuli were based on Teki et al. (2013; PMID 23898398) in which each 50 ms time window contains random frequency elements. Figure frequencies either remained fixed or changed over time, mimicking the formants of speech. Participants had to discriminate gaps that occurred in the “figure” or “background” components—a task that cannot be performed based on global stimulus characteristics. We adapted the TMR to determine the participant's 50% threshold for discriminating gaps in the figure-ground stimuli.
Average audiometric thresholds at 4-8 kHz accounted for 15% of the variance in SIN performance, despite recruiting participants with hearing thresholds that would be considered clinically ‘normal’. Figure-ground performance explained a significant portion of the variance in SIN performance that was unaccounted for by variability in audiometric thresholds. Performance with different figure-ground stimuli explained different portions of the variance, demonstrating they index different reasons why people find SIN difficult.
These results in normally-hearing listeners demonstrate that SIN performance depends on sub-clinical variability in audiometric thresholds. In addition, the results show that we can better predict SIN performance by including measures of figure-ground perception alongside audiometric thresholds. Importantly, the results support a source of variance in speech-in noise perception related to figure-ground perception that is unrelated to audiometric thresholds. Given previous work demonstrates cortical contributions to both speech-in-noise and figure-ground perception, this shared variance likely arises at a central level. Overall, these results highlight the importance of considering both central and peripheral factors if we are to successfully predict speech intelligibility when background noise is present.