Leveraging Parameter Space Symmetries for Reasoning Skill Transfer in LLMs

📅 2025-11-13

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

To address negative interference arising from parameter-space divergence during reasoning skill transfer across large language models (LLMs), this paper proposes a parameter-space alignment method grounded in architectural symmetry. Leveraging inherent permutation, rotation, and scaling symmetries of the Transformer architecture, we introduce a “align-then-transfer” strategy that jointly aligns both weights and activations, while natively supporting modern components such as Grouped-Query Attention (GQA) and SwiGLU. Our approach tightly integrates task arithmetic with symmetry-aware transformations, substantially improving cross-model skill composition. Experiments demonstrate consistent and significant gains over standard task arithmetic across multiple complex reasoning benchmarks—including GSM8K, MATH, and HumanEval—without modifying the target model’s architecture. Crucially, our method successfully transfers advanced reasoning capabilities to base models lacking such abilities, enabling zero-shot reasoning capability injection while preserving original model integrity.

Technology Category

Application Category

📝 Abstract

Task arithmetic is a powerful technique for transferring skills between Large Language Models (LLMs), but it often suffers from negative interference when models have diverged during training. We address this limitation by first aligning the models'parameter spaces, leveraging the inherent permutation, rotation, and scaling symmetries of Transformer architectures. We adapt parameter space alignment for modern Grouped-Query Attention (GQA) and SwiGLU layers, exploring both weight-based and activation-based approaches. Using this alignment-first strategy, we successfully transfer advanced reasoning skills to a non-reasoning model. Experiments on challenging reasoning benchmarks show that our method consistently outperforms standard task arithmetic. This work provides an effective approach for merging and transferring specialized skills across evolving LLM families, reducing redundant fine-tuning and enhancing model adaptability.

Problem

Research questions and friction points this paper is trying to address.

Addressing negative interference in task arithmetic for skill transfer between LLMs

Aligning parameter spaces using Transformer symmetries to enable reasoning transfer

Enhancing skill merging across evolving LLM families to reduce fine-tuning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Aligning parameter spaces using Transformer symmetries

Adapting alignment for GQA and SwiGLU architectures

Transferring reasoning skills via alignment-first strategy

🔎 Similar Papers

The Buffer Mechanism for Multi-Step Information Reasoning in Language Models