One-for-All Model Initialization with Frequency-Domain Knowledge

📅 2026-03-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes FRONT, a novel framework that decouples general-purpose knowledge from the specific architecture of pretrained models, enabling cross-scale reuse without retraining. The authors demonstrate for the first time that the low-frequency components of model weights—extracted via discrete cosine transform—encode task-agnostic knowledge, which they term “learngene.” This learngene can be directly used to initialize downstream models of arbitrary size through simple truncation or padding. To further enhance transferability, the framework incorporates spectral regularization during pretraining. Extensive experiments show that FRONT accelerates convergence by up to 15× on vision tasks and reduces training FLOPs by 40.5% on average in language tasks, achieving state-of-the-art performance.

Technology Category

Application Category

📝 Abstract
Transferring knowledge by fine-tuning large-scale pre-trained networks has become a standard paradigm for downstream tasks, yet the knowledge of a pre-trained model is tightly coupled with monolithic architecture, which restricts flexible reuse across models of varying scales. In response to this challenge, recent approaches typically resort to either parameter selection, which fails to capture the interdependent structure of this knowledge, or parameter prediction using generative models that depend on impractical access to large network collections. In this paper, we empirically demonstrate that a model's foundational, task-agnostic knowledge, its"learngene", is encoded within the low-frequency components of its weights, and can be efficiently inherited by downstream models. Based on this insight, we propose FRONT (FRequency dOmain kNowledge Transfer), a novel framework that uses the Discrete Cosine Transform (DCT) to isolate the low-frequency"learngene". This learngene can be seamlessly adapted to initialize models of arbitrary size via simple truncation or padding, a process that is entirely training-free. For enhanced performance, we propose an optional low-cost refinement process that introduces a spectral regularizer to further improve the learngene's transferability. Extensive experiments demonstrate that FRONT achieves the state-of-the-art performance, accelerates convergence by up to 15 times in vision tasks, and reduces training FLOPs by an average of 40.5% in language tasks.
Problem

Research questions and friction points this paper is trying to address.

knowledge transfer
model reuse
pre-trained models
architecture coupling
scalability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Frequency-domain knowledge transfer
Discrete Cosine Transform
Model initialization
Cross-scale transfer
Training-free adaptation
🔎 Similar Papers
No similar papers found.
J
Jianlu Shen
School of Computer Science and Engineering, Southeast University, Nanjing, China; Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, China
F
Fu Feng
School of Computer Science and Engineering, Southeast University, Nanjing, China; Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, China
Y
Yucheng Xie
School of Computer Science and Engineering, Southeast University, Nanjing, China; Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, China
Jiaqi Lv
Jiaqi Lv
Southeast University
Machine Learning
Xin Geng
Xin Geng
School of Computer Science and Engineering, Southeast University
Artificial IntelligencePattern RecognitionMachine Learning