Abstract
Can videos of minute vibrations of objects in a room be a practical means of recovering intelligible speech? Several aspects of the problem were investigated, including: how much an object vibrates due to sound; what can be done to maximize the vibration; how the MIT technique detects minute vibrations in videos; what affects the quality of the resulting recording; and how good a recording is needed for intelligible speech. The success of earlier experiments depended on loud sounds and prior knowledge. Additional, previously ignored, factors are expected to be significant, but they can be accommodated. Other possibilities for enhancing the technique, by exploiting resonances or reflections, are discussed. A high-speed low-noise camera is essential, and any existing video footage (such as from CCTV) is unlikely to be of sufficient quality. Further experiments with high-end high-speed cameras are needed to assess the practical feasibility of the technique.