Less is More: Pre-Training Cross-Lingual Small-Scale Language Models with Cognitively-Plausible Curriculum Learning Strategies

14 November 2024, Version 1
This content is an early or alternative research output and has not been peer-reviewed by Cambridge University Press at the time of posting.

Abstract

Our work empirically assesses a novel hypothesis that language acquisition theories can help design more effective curriculum learning strategies for Cognitively-inspired Small-Scale Language Models (SSLMs). SSLMs are Transformer-based language models trained on corpora that approximate the volume and nature of input that a first-language learner can expect to receive during language acquisition. Curriculum Learning (CL) has emerged as a promising method to improve the cognitive plausibility of SSLMs in the first BabyLM Shared Task, as a way to gradually introduce more complex linguistic phenomena into the model later in training in a manner that is similar to human language acquisition. However, CL strategies have not led to considerable improvements over non-curriculum models. This is contrary to the predictions of linguistic theory, which suggests that children naturally focus on input that is neither too simple nor too difficult but at the right level of challenge for learning. This acquisition behaviour, known as the “Goldilocks Effect”, is a form of self-selecting curriculum learning that appears to naturally occur in first language (L1) acquisition. We compare the success of three curricula (Growing, Inwards & MMM) that precisely replicate the predictions of contrastive acquisition theories based on contemporary Chomksyan acquisition models to specify fine-grained curriculum learning strategies on a volume of Child-Directed Speech (CDS) that a learner would expect to receive by 6 years-old (6;0) for five languages. Overall, Curriculum Learning strategies that precisely replicate language acquisition theories, formulated based on careful analysis of child developmental sequences, can lead to better-performing data-efficient architectures.

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting and Discussion Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.