Do Syntactic Categories Help in Developmentally Motivated Curriculum Learning for Language Models?

📅 2025-11-11
🏛️ Proceedings of the First BabyLM Workshop
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates whether syntactic category information can enhance the effectiveness of developmentally motivated curriculum learning in language model training and elucidates the explanatory role of syntactic knowledge in linguistic competence. Method: Leveraging the BabyLM and CHILDES corpora, we systematically characterize syntactic distribution patterns across developmental stages of child language acquisition, and propose a cognitively inspired, syntax-aware curriculum strategy that dynamically selects training subsets based on syntactic complexity and token frequency. Contribution/Results: Experiments demonstrate that syntax-classifiable subsets—contrasted with noisy full-data baselines—yield consistent improvements on downstream tasks such as reading comprehension (+3.2% average gain). Moreover, the curriculum progression closely aligns with empirically observed human language development trajectories. To our knowledge, this is the first work to integrate fine-grained syntactic structure modeling into a developmental curriculum learning framework, establishing a novel paradigm for interpretable, cognitively grounded language model training.

Technology Category

Application Category

📝 Abstract
We examine the syntactic properties of BabyLM corpus, and age-groups within CHILDES. While we find that CHILDES does not exhibit strong syntactic differentiation by age, we show that the syntactic knowledge about the training data can be helpful in interpreting model performance on linguistic tasks. For curriculum learning, we explore developmental and several alternative cognitively inspired curriculum approaches. We find that some curricula help with reading tasks, but the main performance improvement come from using the subset of syntactically categorizable data, rather than the full noisy corpus.
Problem

Research questions and friction points this paper is trying to address.

Investigating syntactic category benefits in curriculum learning
Analyzing syntactic properties of child language corpora
Evaluating curriculum approaches for linguistic task performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using syntactic categorization to filter training data
Applying developmental curriculum learning approaches
Analyzing syntactic properties for model interpretation
🔎 Similar Papers
No similar papers found.
A
Arzu Burcu Güven
IT University of Copenhagen, Denmark
Anna Rogers
Anna Rogers
IT University of Copenhagen
Natural Language ProcessingLanguage ModelsArtificial IntelligenceAI and society
R
Rob van der Goot
IT University of Copenhagen, Denmark