PTPP-Aware Adaptation Scaling Laws: Predicting Domain-Adaptation Performance at Unseen Pre-Training Budgets

📅 2025-10-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing continuous pretraining (CPT) scaling laws assume a fixed pretraining budget and fail to generalize across varying tokens-per-parameter (PTPP) regimes, hindering accurate prediction of domain-adaptive performance under diverse resource constraints. Method: We propose a PTPP-aware scaling law that explicitly incorporates PTPP as a core variable, enabling cross-stage prediction of target-domain loss and principled planning of replay ratio and adaptation budget under resource constraints. Within a multilingual CPT framework, we quantify early-data predictive power using Huber-on-log loss, relative MAE, and calibration slope. Contribution/Results: Our method achieves accurate out-of-distribution prediction—e.g., models trained at PTPP = 15 or 31 reliably forecast target loss at PTPP = 279—significantly outperforming PTPP-agnostic baselines. It attains state-of-the-art performance across multiple evaluation metrics, enabling robust, resource-aware CPT scheduling and domain adaptation.

Technology Category

Application Category

📝 Abstract
Continual pre-training (CPT) for domain adaptation must balance target-domain gains with stability on the base domain. Existing CPT scaling laws typically assume a fixed pre-training budget, which limits their ability to forecast adaptation outcomes for models trained at different tokens-per-parameter (PTPP). We present emph{PTPP-aware} adaptation scaling laws that make the pre-training budget an explicit variable, enabling accurate emph{prediction} of adaptation loss at unseen ptpp. On a multilingual setup (English/Arabic $ ightarrow$ French), PTPP-aware formulations trained on early stages (ptpp{}={15,31}) predict target loss at ptpp{}=279 and outperform a PTPP-agnostic dcpt{} transfer baseline on metrics (Huber-on-log, MAE$_mathrm{rel}$, calibration slope); full diagnostics (RMSE, MAPE) are in the appendix. Beyond forecasting, we show a practical use case: planning replay ratios and adaptation token budgets that satisfy target and forgetting constraints under compute limits.
Problem

Research questions and friction points this paper is trying to address.

Predicting domain adaptation performance at unseen pre-training budgets
Balancing target domain gains with base domain stability
Planning replay ratios and token budgets under compute limits
Innovation

Methods, ideas, or system contributions that make the work stand out.

PTPP-aware scaling laws predict adaptation loss
Explicit pre-training budget variable enables forecasting
Optimizes replay ratios under computational constraints
🔎 Similar Papers
No similar papers found.