Leveraging Parameter Space Symmetries for Reasoning Skill Transfer in LLMs

📅 2025-11-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address negative interference arising from parameter-space divergence during reasoning skill transfer across large language models (LLMs), this paper proposes a parameter-space alignment method grounded in architectural symmetry. Leveraging inherent permutation, rotation, and scaling symmetries of the Transformer architecture, we introduce a “align-then-transfer” strategy that jointly aligns both weights and activations, while natively supporting modern components such as Grouped-Query Attention (GQA) and SwiGLU. Our approach tightly integrates task arithmetic with symmetry-aware transformations, substantially improving cross-model skill composition. Experiments demonstrate consistent and significant gains over standard task arithmetic across multiple complex reasoning benchmarks—including GSM8K, MATH, and HumanEval—without modifying the target model’s architecture. Crucially, our method successfully transfers advanced reasoning capabilities to base models lacking such abilities, enabling zero-shot reasoning capability injection while preserving original model integrity.

Technology Category

Application Category

📝 Abstract
Task arithmetic is a powerful technique for transferring skills between Large Language Models (LLMs), but it often suffers from negative interference when models have diverged during training. We address this limitation by first aligning the models'parameter spaces, leveraging the inherent permutation, rotation, and scaling symmetries of Transformer architectures. We adapt parameter space alignment for modern Grouped-Query Attention (GQA) and SwiGLU layers, exploring both weight-based and activation-based approaches. Using this alignment-first strategy, we successfully transfer advanced reasoning skills to a non-reasoning model. Experiments on challenging reasoning benchmarks show that our method consistently outperforms standard task arithmetic. This work provides an effective approach for merging and transferring specialized skills across evolving LLM families, reducing redundant fine-tuning and enhancing model adaptability.
Problem

Research questions and friction points this paper is trying to address.

Addressing negative interference in task arithmetic for skill transfer between LLMs
Aligning parameter spaces using Transformer symmetries to enable reasoning transfer
Enhancing skill merging across evolving LLM families to reduce fine-tuning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Aligning parameter spaces using Transformer symmetries
Adapting alignment for GQA and SwiGLU architectures
Transferring reasoning skills via alignment-first strategy