Abstract
This study tested the ability of 80 listeners in comprehending face mask speech, presented in an audio-video visual context. Nine speakers of southern standard British English produced a total of 120 sentences, across three conditions (no mask, cotton mask and niqāb). The speakers were recorded in an audio-video visual context with two types of stimuli generated for a listening test. First, control stimuli, with matching audio and video components. Secondly, mismatched stimuli. E.g the speaker spoke with a cotton mask but was visually presented wearing no mask. The sentence stimuli were taken from Kalikow et al. (1977) and modified to incorporate minimal pairs likely to yield confusion in ordinary speech (/f/ and /s/, /f/ and /h/, /p/ and /k/, /p/ and /h/ and /s/ and /ʃ/). For example, “Molly thought about the FORD/SWORD”. After watching the video, participants saw two images with their task being to pick, via keyboard response, the image indicated in the sentence. One set of listener participants had no exposure to face coverings and the other had daily exposure. Neither, increased exposure to face coverings nor face mask audio (cotton mask and niqāb) had an effect on performance accuracy. However, the visual face covering (cotton mask and niqāb) corresponded with a significant reduction in performance accuracy.