Abstract
By the end of the century, over half of the 6500 languages spoken in the world will die out (Turin, 2007). Nepal's situation is particularly dire: of the 120+ distinct languages identified in the 2011 census, 60 are endangered due to globalisation, socio-political unrest, and environmental challenges. The loss of these languages also means the loss of unique cultural and religious identifiers. Given this, there is a need for methods and tools to preserve linguistic diversity. A major challenge in language preservation, however, is the transcription bottleneck (Shi et al., 2021): transcribing one minute of audio requires an average of 40+ minutes (Durantin et al., 2017). This becomes even more complicated for endangered languages with no (standardised) orthographies or documentation. While advanced automatic speech-recognition (ASR) tools are available, they are often ineffective for these extremely low-resource languages (Foley et al., 2018). This poster presents the preliminary results to address these issues for the Newar and Dzardzongke (both representing different branches of the Sino-Tibetan language family, spoken in Nepal) using Wav2Vec2 models fine-tuned for low-resource languages (Coto-Solano 2021, 2022). We show that endangered languages benefit from a specific set of optimisation procedures through tests with Kaldi vs Wav2Vec2; different types of data augmentation, and the development of a new or standardisation of orthography.