Probabilistic Token Alignment for Large Language Model Fusion

📅 2025-09-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing large language model (LLM) fusion approaches rely on manually predefined vocabulary alignment, limiting adaptability to diverse contextual settings and hindering fusion efficacy. To address this, we propose a probabilistic token alignment framework grounded in optimal transport, which formulates alignment as a soft mapping problem between token distributions—enabling automatic, interpretable, and architecture-agnostic alignment across heterogeneous models. Our method integrates distribution-aware learning with probabilistic mapping modeling, eliminating manual intervention while achieving fine-grained, semantics-preserving token-level matching and supporting end-to-end parameter fusion. Extensive evaluations across multiple benchmarks demonstrate that the fused models consistently outperform baselines in reasoning, commonsense understanding, and linguistic comprehension, validating the method’s effectiveness, robustness, and generalizability. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract
Training large language models (LLMs) from scratch can yield models with unique functionalities and strengths, but it is costly and often leads to redundant capabilities. A more cost-effective alternative is to fuse existing pre-trained LLMs with different architectures into a more powerful model. However, a key challenge in existing model fusion is their dependence on manually predefined vocabulary alignment, which may not generalize well across diverse contexts, leading to performance degradation in several evaluation. To solve this, we draw inspiration from distribution learning and propose the probabilistic token alignment method as a general and soft mapping for alignment, named as PTA-LLM. Our approach innovatively reformulates token alignment into a classic mathematical problem: optimal transport, seamlessly leveraging distribution-aware learning to facilitate more coherent model fusion. Apart from its inherent generality, PTA-LLM exhibits interpretability from a distributional perspective, offering insights into the essence of the token alignment. Empirical results demonstrate that probabilistic token alignment enhances the target model's performance across multiple capabilities. Our code is avaliable at https://runjia.tech/neurips_pta-llm/.
Problem

Research questions and friction points this paper is trying to address.

Fusing existing pre-trained LLMs with different architectures effectively
Overcoming manual vocabulary alignment limitations in model fusion
Achieving coherent token alignment across diverse language contexts
Innovation

Methods, ideas, or system contributions that make the work stand out.

Probabilistic token alignment for model fusion
Reformulates alignment as optimal transport problem
Uses distribution-aware learning for coherent fusion
🔎 Similar Papers
No similar papers found.