An assessment of objective intelligibility metrics for signals with low mixture signal-to-noise ratios after enhancement using Ideal Binary Masks
This study concerns how well objective indicators of speech intelligibility correlate with the percentages of words that are correctly identified when speech is mixed with white Gaussian noise at low signal-to-noise ratios (SNRs) and subsequently enhanced with Ideal Binary Masks (IBMs). Such masks require a priori knowledge of both the target signal and the masker. The objective indicators under consideration include Short-Time Objective Intelligibility (STOI), which is suitable for both noisy and degraded speech, including non-linearly processed or time-frequency weighted speech. STOI involves the correlation of the envelopes of clean and degraded (or processed) speech signals that have been divided into overlapping short-time (384 ms) segments. In this study, we mixed speech produced by two male and two female speakers of British English with white Gaussian noise at SNRs as low as -25 dB. Signals were subsequently enhanced using IBMs with a Local Criterion equal to zero or to the mixture SNR. Listening tests involving normal-hearing human listeners were carried out, where each signal was presented to the listener three times. The results characterise the relationship between the objective indicators and the percentages of words correctly identified by the listeners in the context of low mixture SNRs.