Less is More: Pre-Training Cross-Lingual Small-Scale Language Models with Cognitively-Plausible Curriculum Learning Strategies

Suchir Salhan; Richard Diehl Martinez; Zebulon Goriely; Paula Buttery

doi:10.33774/coe-2024-d0jz7

Computer Science

Search within Computer Science

Less is More: Pre-Training Cross-Lingual Small-Scale Language Models with Cognitively-Plausible Curriculum Learning Strategies

14 November 2024, Version 1

Poster

Show author details

This content is an early or alternative research output and has not been peer-reviewed by Cambridge University Press at the time of posting.

Abstract

Our work empirically assesses a novel hypothesis that language acquisition theories can help design more effective curriculum learning strategies for Cognitively-inspired Small-Scale Language Models (SSLMs). SSLMs are Transformer-based language models trained on corpora that approximate the volume and nature of input that a first-language learner can expect to receive during language acquisition. Curriculum Learning (CL) has emerged as a promising method to improve the cognitive plausibility of SSLMs in the first BabyLM Shared Task, as a way to gradually introduce more complex linguistic phenomena into the model later in training in a manner that is similar to human language acquisition. However, CL strategies have not led to considerable improvements over non-curriculum models. This is contrary to the predictions of linguistic theory, which suggests that children naturally focus on input that is neither too simple nor too difficult but at the right level of challenge for learning. This acquisition behaviour, known as the “Goldilocks Effect”, is a form of self-selecting curriculum learning that appears to naturally occur in first language (L1) acquisition. We compare the success of three curricula (Growing, Inwards & MMM) that precisely replicate the predictions of contrastive acquisition theories based on contemporary Chomksyan acquisition models to specify fine-grained curriculum learning strategies on a volume of Child-Directed Speech (CDS) that a learner would expect to receive by 6 years-old (6;0) for five languages. Overall, Curriculum Learning strategies that precisely replicate language acquisition theories, formulated based on careful analysis of child developmental sequences, can lead to better-performing data-efficient architectures.

Supplementary weblinks

Title

Description

Actions

Title

Code

Description

Code for Cognitively-Plausible Small-Scale Language Models trained using developmentally-plausible corpora of Child-Directed Speech, and a series of universal and language-specific objective curricula.

Actions

View

Title

Trained Models & Training Datasets

Description

Models and Datasets developed in this paper are available to use on HuggingFace.

Actions

View

Title

Cambridge Small Language Models

Description

Resources for Cambridge University Small Language Models.

Actions

View

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting and Discussion Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Nov 14, 2024 Version 1

Metrics

223

Views

Downloads

Citations

License

DOI

10.33774/coe-2024-d0jz7

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) declare that they have sought and gained approval from the relevant ethics committee/IRB for this research and its publication.

Conference

Cambridge Language Sciences Annual Symposium 2024

Less is More: Pre-Training Cross-Lingual Small-Scale Language Models with Cognitively-Plausible Curriculum Learning Strategies

Authors

Abstract

Supplementary weblinks

Comments

Version History

Metrics

License

DOI

Author’s competing interest statement

Ethics

Conference

Share