Using automatic speech recognition for the prediction of impaired speech identification
Age-related hearing loss (ARHL) is a very prevalent hearing disorder in adults that negatively impacts on the ability to understand speech, especially in noisy environments. The most common rehabilitation strategy is to fit hearing aids (HAs). Their benefit is generally assessed by measuring speech-identification performance with and without HAs. However, such so-called “speech audiometry” can be fairly lengthy, and its results are likely to be influenced by the patient’s level of fatigue, cognitive state and familiarity with the speech material used for the assessment.
In order to overcome these issues, the feasibility of using objective measures based on automatic speech recognition (ASR) to predict human speech-identification performances was recently investigated (Fontan et al., 2017; Fontan et al., in preparation; Kollmeier et al., 2016).
Here, we present the results of a series of experiments, that combined ASR and an ARHL simulation to predict human performances for various tasks ranging from phoneme discrimination to sentences identification. More specifically, signal processing techniques (Nejime & Moore, 1997) were used to process the speech tokens to mimic some of the perceptual consequences of ARHL on speech perception (i.e., elevated thresholds, reduced frequency selectivity and loudness recruitment), and the processed speech tokens were then fed to an ASR system for analysis. To provide “proof-of-concept”, our first experiments focussed on the prediction of unaided speech perception in quiet, while subsequent experiments investigated the applicability of the ASR system to aided and unaided speech perception in noise.
Fontan, L., Cretin-Maitenaz, T., & Füllgrabe, C. (In preparation). Automatic speech recognition predicts speech perception in older hearing-impaired listeners.
Fontan, L., Ferrané, I., Farinas, J., Pinquier, J., Magnen, C., Tardieu, J., Gaillard, P., Aumont, X., & Füllgrabe, C. (2017). Automatic speech recognition predicts speech intelligibility and comprehension for listeners with simulated age-related hearing loss. Journal of Speech, Language, and Hearing Research, 60, 2394-2405.
Kollmeier, B., Schädler, M. R., Warzybok, A., Meyer, B. T., & Brand, T. (2016). Sentence recognition prediction for hearing-impaired listeners in stationary and fluctuation noise with FADE: Empowering the attenuation and distortion concept by Plomp with a quantitative processing model. Trends in Hearing, 20, 233121651665579.
Nejime, Y., & Moore, B. C. J. (1997). Simulation of the effect of threshold elevation and loudness recruitment combined with reduced frequency selectivity on the intelligibility of speech in noise. Journal of the Acoustical Society of America, 102, 603-615.