Model Assembly Learning with Heterogeneous Layer Weight Merging

๐Ÿ“… 2025-03-27
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the challenge of enhancing foundation model capabilities without requiring additional data or training. We propose Model Assembly Learning (MAL), a novel paradigm enabling open-parameter fusion across heterogeneous architectures. MAL introduces three key mechanisms: layer-width-adaptive weight mapping, selective parameter injection, and loss-basin-agnostic iterative fusionโ€”enabling, for the first time, non-aligned parameter merging across architectures and layer widths, thereby breaking conventional assumptions of linear connectivity and architectural homogeneity. Theoretically, we establish feasibility conditions and practical guidelines for heterogeneous parameter merging. Empirically, MAL consistently improves foundation model performance across diverse multi-task benchmarks, demonstrating both effectiveness and generalizability. Overall, MAL provides a scalable, training-free pathway for open model collaboration, advancing the frontier of parameter-efficient model integration.

Technology Category

Application Category

๐Ÿ“ Abstract
Model merging acquires general capabilities without extra data or training by combining multiple models' parameters. Previous approaches achieve linear mode connectivity by aligning parameters into the same loss basin using permutation invariance. In this paper, we introduce Model Assembly Learning (MAL), a novel paradigm for model merging that iteratively integrates parameters from diverse models in an open-ended model zoo to enhance the base model's capabilities. Unlike previous works that require identical architectures, MAL allows the merging of heterogeneous architectures and selective parameters across layers. Specifically, the base model can incorporate parameters from different layers of multiple pre-trained models. We systematically investigate the conditions and fundamental settings of heterogeneous parameter merging, addressing all possible mismatches in layer widths between the base and target models. Furthermore, we establish key laws and provide practical guidelines for effectively implementing MAL.
Problem

Research questions and friction points this paper is trying to address.

Merging heterogeneous model architectures without identical structures
Enhancing base models by integrating parameters from diverse pre-trained models
Addressing layer width mismatches in heterogeneous parameter merging
Innovation

Methods, ideas, or system contributions that make the work stand out.

Iteratively merges parameters from diverse models
Allows merging heterogeneous architectures and layers
Addresses layer width mismatches systematically
๐Ÿ”Ž Similar Papers
No similar papers found.
Yi-Kai Zhang
Yi-Kai Zhang
Nanjing University
Model RecommendationMultimodal Large Language Model
J
Jin Wang
Yingcai Honors College, University of Electronic Science and Technology of China
X
Xu-Xiang Zhong
School of Artificial Intelligence, Nanjing University; National Key Laboratory for Novel Software Technology, Nanjing University
De-Chuan Zhan
De-Chuan Zhan
Nanjing University, China
Machine LearningData Mining
Han-Jia Ye
Han-Jia Ye
Nanjing University
Machine LearningData MiningMetric LearningMeta-Learning