CurLL: A Developmental Framework to Evaluate Continual Learning in Language Models

📅 2025-10-14

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This work addresses the challenge of quantifying skill evolution and forgetting in continual learning for language models. We propose the first developmental psychology–inspired evaluation framework, structuring assessment across five stages aligned with cognitive development trajectories of children aged 5–10 years. The framework introduces an interpretable skill graph modeling hierarchical dependencies among linguistic and reasoning abilities, and a large-scale synthetic dataset (23.4B tokens) with controlled lexical complexity and diverse formatting. It is the first to systematically integrate human developmental theory into LLM continual learning evaluation, enabling fine-grained analysis of forward/backward transfer and skill forgetting. Experiments on a 135M-parameter Transformer demonstrate that the framework effectively exposes trade-offs among skill retention and transfer under distinct training paradigms—task-isolated, joint, and sequential—thereby establishing a reproducible, interpretable, and developmentally grounded benchmark for continual learning.

Technology Category

Application Category

📝 Abstract

We introduce a comprehensive continual learning dataset and benchmark (CurlL) grounded in human developmental trajectories from ages 5-10, enabling systematic and fine-grained assessment of models' ability to progressively acquire new skills. CurlL spans five developmental stages (0-4) covering ages 5-10, supported by a skill graph that breaks down broad skills into smaller abilities, concrete goals, and measurable indicators, while also capturing which abilities build on others. We generate a 23.4B-token synthetic dataset with controlled skill progression, vocabulary complexity, and format diversity, comprising paragraphs, comprehension-based QA (CQA), skill-testing QA (CSQA), and instruction-response (IR) pairs. Stage-wise token counts range from 2.12B to 6.78B tokens, supporting precise analysis of forgetting, forward transfer, and backward transfer. Using a 135M-parameter transformer trained under independent, joint, and sequential (continual) setups, we show trade-offs in skill retention and transfer efficiency. By mirroring human learning patterns and providing fine-grained control over skill dependencies, this work advances continual learning evaluations for language models.

Problem

Research questions and friction points this paper is trying to address.

Evaluating continual learning in language models using developmental trajectories

Assessing model ability to progressively acquire new skills systematically

Analyzing trade-offs in skill retention and transfer efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Developmental benchmark with skill graph for fine-grained assessment

Synthetic dataset with controlled skill progression and diversity

Transformer model trained under continual learning setups

🔎 Similar Papers

No similar papers found.