Abstract
By the end of the century, over half of the 6500 languages spoken in the world will die out (Turin, 2007). Nepal's situation is particularly dire: of the 120+ distinct languages identified in the 2011 census, 60 are endangered due to globalisation, socio-political unrest, and environmental challenges. The loss of these languages also means the loss of unique cultural and religious identifiers. Given this, there is a need for methods and tools to preserve linguistic diversity. A major challenge in language preservation, however, is the transcription bottleneck (Shi et al., 2021): transcribing one minute of audio requires an average of 40+ minutes (Durantin et al., 2017). This becomes even more complicated for endangered languages with no (standardised) orthographies or documentation. While advanced automatic speech-recognition (ASR) tools are available, they are often ineffective for these extremely low-resource languages (Foley et al., 2018). This poster presents the preliminary results to address these issues for the Newar and Dzardzongke (both representing different branches of the Sino-Tibetan language family, spoken in Nepal) using Wav2Vec2 models fine-tuned for low-resource languages (Coto-Solano 2021, 2022). We show that endangered languages benefit from a specific set of optimisation procedures through tests with Kaldi vs Wav2Vec2; different types of data augmentation, and the development of a new or standardisation of orthography.



![Author ORCID: We display the ORCID iD icon alongside authors names on our website to acknowledge that the ORCiD has been authenticated when entered by the user. To view the users ORCiD record click the icon. [opens in a new tab]](https://www.cambridge.org/engage/assets/public/coe/logo/orcid.png)