A realistic test platform for near end listening enhancement
NELE (Near End Listening Enhancement) aims at improving the intelligibility of speech playback in noise. NELE algorithms are often evaluated in very controlled acoustic conditions, e.g. using synthetic speech shaped noise as a masker and not accounting for reverberation. While this is advantageous in terms of reproducibility, the benefit of NELE algorithms in real-world scenarios, e.g. for public announcements or telephone calls, may be overestimated. In order to create a more realistic test platform, two representative real-life scenarios were simulated: a large and crowded public space (the cafeteria) and a small domestic environment (the living room), which represent respectively a source of stationary and of fluctuating noise. Binaural impulse responses of real spaces [1] and live noise recordings were used for the simulations.
A listening test with N=24 normal hearing subjects was conducted. Intelligibility scores (in terms of correct keywords percentage) for unmodified speech were compared to those of a milestone study [2] on speech intelligibility in noise. Results indicate that higher SNRs are needed in order to achieve the same intelligibility levels when realistic noise is used, with differences of up to 8.6 dB. Preliminary results for a selection of NELE algorithms suggest that realistic noise proves to be more challenging also for modified speech, notwithstanding the type of modification.
This study exposes the gap between controlled lab conditions and the proposed real-world simulations, where the latter can provide a more meaningful prediction of the performance of NELE algorithms (and possibly other technologies) in real-life scenarios.
References
1. Kayser, Hendrik, et al. "Database of multichannel in-ear and behind-the-ear head-related and binaural room impulse responses." EURASIP Journal on Advances in Signal Processing, 2009: 6.
2. Cooke, Martin, et al. "Evaluating the intelligibility benefit of speech modifications in known noise conditions." Speech Communication 55.4 (2013): 572-585.