Preference Curriculum: LLMs Should Always Be Pretrained on Their Preferred Data

📅 2025-01-21

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

Current large language model (LLM) pretraining relies on static, uniform data distributions, ignoring the dynamic evolution of model capabilities during training and thus limiting learning efficiency. To address this, we propose the first dynamic preference-aware curriculum learning framework for LLM continual pretraining. Our method quantifies sample difficulty via perplexity difference (PD) and constructs a predictive function modeling evolving model preferences, enabling offline full-dataset orchestration and uninterrupted, adaptive scheduling throughout training. It eliminates the need for manually designed curricula or online resampling. Evaluated on 1.3B- and 3B-parameter models, our approach improves average accuracy by over 4.1% across multiple benchmarks compared to strong baselines—demonstrating both the efficacy and scalability of dynamically adapting curricula to the model’s evolving competence.

Technology Category

Application Category

📝 Abstract

Current large language models (LLMs) generally utilize a consistent data distribution throughout the entire pretraining process. However, as the model's ability improves, it intuitively should be pretrained with differentiated data. To achieve it, we propose the Perplexity Difference based Preference Curriculum learning (PDPC) framework, which always perceives and uses the data preferred by LLMs to train and boost them. Firstly, we introduce the PD metric to measure the difference in how well strong and weak models fit the samples. Samples with high PD are more challenging for weak models to learn and are more suitable to be arranged in the later stage of pretraining. Secondly, we propose the PD preference function to approximate the model and predict the data preference of the LLM at any time, so as to complete the arrangement of the entire data offline and ensure continuous training without interruption. Experimental results on 1.3B and 3B models demonstrate that our PDPC significantly surpasses baselines. Notably, the 3B model achieved more substantial gains, with an increased average accuracy of over 4.1% across various benchmarks.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Learning Efficiency

Adaptive Learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

PDPC

Predictive Difficulty Preference Calculation

Dynamic Learning Material Adjustment

🔎 Similar Papers

No similar papers found.