Do Syntactic Categories Help in Developmentally Motivated Curriculum Learning for Language Models?

📅 2025-11-11

🏛️ Proceedings of the First BabyLM Workshop

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

This study investigates whether syntactic category information can enhance the effectiveness of developmentally motivated curriculum learning in language model training and elucidates the explanatory role of syntactic knowledge in linguistic competence. Method: Leveraging the BabyLM and CHILDES corpora, we systematically characterize syntactic distribution patterns across developmental stages of child language acquisition, and propose a cognitively inspired, syntax-aware curriculum strategy that dynamically selects training subsets based on syntactic complexity and token frequency. Contribution/Results: Experiments demonstrate that syntax-classifiable subsets—contrasted with noisy full-data baselines—yield consistent improvements on downstream tasks such as reading comprehension (+3.2% average gain). Moreover, the curriculum progression closely aligns with empirically observed human language development trajectories. To our knowledge, this is the first work to integrate fine-grained syntactic structure modeling into a developmental curriculum learning framework, establishing a novel paradigm for interpretable, cognitively grounded language model training.

Technology Category

Application Category

📝 Abstract

We examine the syntactic properties of BabyLM corpus, and age-groups within CHILDES. While we find that CHILDES does not exhibit strong syntactic differentiation by age, we show that the syntactic knowledge about the training data can be helpful in interpreting model performance on linguistic tasks. For curriculum learning, we explore developmental and several alternative cognitively inspired curriculum approaches. We find that some curricula help with reading tasks, but the main performance improvement come from using the subset of syntactically categorizable data, rather than the full noisy corpus.

Problem

Research questions and friction points this paper is trying to address.

Investigating syntactic category benefits in curriculum learning

Analyzing syntactic properties of child language corpora

Evaluating curriculum approaches for linguistic task performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Using syntactic categorization to filter training data

Applying developmental curriculum learning approaches

Analyzing syntactic properties for model interpretation

🔎 Similar Papers

No similar papers found.