Abstract
Visual feedback of articulators using Electromagnetic- Articulography (EMA) has been shown to aid the acquisition of non-native speech sounds. Using physical EMA sensors is expensive and invasive making it impractical for providing real-world pronunciation feedback. Our work focuses on us- ing neural Acoustic-to-Articulatory Inversion (AAI) models to map speech directly to EMA sensor positions. Self-Supervised Learning (SSL) speech models, such as HuBERT, can produce representations of speech that have been shown to significantly improve performance on AAI tasks. Probing experiments have indicated that certain layers and iterations of SSL models produce representations that may yield better inversion performance than others. In this work, we build on these probing results to create an AAI model that improves upon a state-of-the-art baseline inversion model and evaluate the model’s suitability for second language pronunciation training.