Speaker and speech dependence in a deep neural networks speech separation algorithm
Hearing aid users are challenged in listening situations with noise and especially speech-on-speech situations with two or more competing voices. Specifically, the task of segregating two competing voices is very hard, unlike for normal-hearing listeners.
Recently, deep neural network (DNN) algorithms have shown great potential in tasks like blind source separation of a single-channel (monaural) mixture of multiple voices. The idea is to train the algorithm on relatively short samples of clean speech, thus learning the characteristics of each voice. Once trained for those specific voices, the network can then be applied to mixtures of new speech samples from the same voices.
The current implementation of the DNN has shown a benefit for hearing impaired listeners (Bramsløw et al., 2018) using this voice-specific training on the Danish HINT sentence material (Nielsen and Dau, 2011), but the network may also provide a benefit when applied to new voices.
New speech material has been recorded, both HINT and continuous speech, using three new male and three new female voices. The present study investigated the effect of changing targets and maskers in voice-specific DNN’s using objective metrics as predictors of speech separation performance. Furthermore, the effect of training on sentence material and testing on continuous material and vice versa, was evaluated.