Transport and Merge: Cross-Architecture Merging for Large Language Models

πŸ“… 2026-02-05
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge of efficiently transferring knowledge from large-scale models to structurally heterogeneous, small-scale, low-resource language models. The authors propose a cross-architecture model fusion framework grounded in optimal transport theory. By aligning activations to infer neuron correspondences between heterogeneous architectures, the method derives a transport plan that directly merges weight spaces, enabling effective knowledge transfer with minimal input data. As the first approach to achieve cross-architecture knowledge distillation from large to small models without requiring architectural consistency, it overcomes a key limitation of conventional fusion techniques. Experimental results demonstrate substantial improvements over baseline small models across diverse low-resource languages and specialized downstream tasks, confirming the method’s effectiveness and broad applicability.

Technology Category

Application Category

πŸ“ Abstract
Large language models (LLMs) achieve strong capabilities by scaling model capacity and training data, yet many real-world deployments rely on smaller models trained or adapted from low-resource data. This gap motivates the need for mechanisms to transfer knowledge from large, high-resource models to smaller, low-resource targets. While model merging provides an effective transfer mechanism, most existing approaches assume architecture-compatible models and therefore cannot directly transfer knowledge from large high-resource LLMs to heterogeneous low-resource targets. In this work, we propose a cross-architecture merging framework based on optimal transport (OT) that aligns activations to infer cross-neuron correspondences between heterogeneous models. The resulting transport plans are then used to guide direct weight-space fusion, enabling effective high-resource to low-resource transfer using only a small set of inputs. Extensive experiments across low-resource languages and specialized domains demonstrate consistent improvements over target models.
Problem

Research questions and friction points this paper is trying to address.

cross-architecture
model merging
knowledge transfer
large language models
low-resource
Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-architecture merging
optimal transport
large language models
knowledge transfer
model fusion
πŸ”Ž Similar Papers
No similar papers found.