Constraint-based Pre-training: From Structured Constraints to Scalable Model Initialization

📅 2026-04-16
📈 Citations: 0
Influential: 0
📄 PDF

career value

180K/year
🤖 AI Summary
This work addresses the inflexibility of conventional pre-trained models, whose fixed sizes hinder adaptation to downstream tasks requiring varying model scales. The authors propose a novel pre-training paradigm based on structured constraints, introducing Kronecker factorization into pre-training for the first time. This approach decouples model weights into a scale-invariant, reusable weight template and a lightweight, data-driven weight scaler, framing variable-scale model initialization as a multi-task adaptation problem. The method supports arbitrary depths and widths in both Transformer and CNN architectures, significantly accelerating convergence and improving performance across diverse tasks—including image classification, generation, and embodied control—thereby enabling efficient and flexible model deployment.

Technology Category

Application Category

📝 Abstract
The pre-training and fine-tuning paradigm has become the dominant approach for model adaptation. However, conventional pre-training typically yields models at a fixed scale, whereas practical deployment often requires models of varying sizes, exposing its limitations when target model scales differ from those used during pre-training. To address this, we propose an innovative constraint-based pre-training paradigm that imposes structured constraints during pre-training to disentangle size-agnostic knowledge into reusable weight templates, while assigning size-specific adaptation to lightweight weight scalers, thereby reformulating variable-sized model initialization as a multi-task adaptation problem. Within this paradigm, we further introduce WeiT, which employs Kronecker-based constraints to regularize the pre-training process. Specifically, model parameters are represented as compositions of weight templates via concatenation and weighted aggregation, with adaptive connections governed by lightweight weight scalers whose parameters are learned from limited data. This design enables flexible and efficient construction of model weights across diverse downstream scales. Extensive experiments demonstrate the efficiency and effectiveness of WeiT, achieving state-of-the-art performance in initializing models with varying depths and widths across a broad range of perception and embodied learning tasks, including Image Classification, Image Generation, and Embodied Control. Moreover, its effectiveness generalizes to both Transformer-based and Convolution-based architectures, consistently enabling faster convergence and improved performance even under full training.
Problem

Research questions and friction points this paper is trying to address.

pre-training
model scaling
variable-sized models
model initialization
transfer learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

constraint-based pre-training
weight templates
weight scalers
Kronecker constraints
scalable model initialization
🔎 Similar Papers
F
Fu Feng
School of Computer Science and Engineering, Southeast University, Nanjing, China and the Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, China
Y
Yucheng Xie
School of Computer Science and Engineering, Southeast University, Nanjing, China and the Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, China
R
Ruixiao Shi
School of Computer Science and Engineering, Southeast University, Nanjing, China and the Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, China
Jing Wang
Jing Wang
Nanjing University
Bandit
Xin Geng
Xin Geng
School of Computer Science and Engineering, Southeast University
Artificial IntelligencePattern RecognitionMachine Learning