Layer-Aware Task Arithmetic: Disentangling Task-Specific and Instruction-Following Knowledge

📅 2025-02-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
When merging multiple tasks in large language models, task-specific knowledge interferes with general instruction-following capability, degrading overall performance. To address this, we propose Layer-Aware Task Arithmetic (LATA), the first method to explicitly decouple task knowledge from instruction-following ability by incorporating layer-wise knowledge alignment, gradient sensitivity, and learnable layer weights into the task vector weighting mechanism. LATA enables high-fidelity multi-task learning and selective task forgetting. Evaluated on WikiText-2, GSM8K, and HumanEval, it achieves an average multi-task accuracy improvement of 4.2% over baseline methods, while reducing output quality degradation by 63%. These gains significantly surpass those of standard task arithmetic, demonstrating superior task composition fidelity and robustness.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) demonstrate strong task-specific capabilities through fine-tuning, but merging multiple fine-tuned models often leads to degraded performance due to overlapping instruction-following components. Task Arithmetic (TA), which combines task vectors derived from fine-tuning, enables multi-task learning and task forgetting but struggles to isolate task-specific knowledge from general instruction-following behavior. To address this, we propose Layer-Aware Task Arithmetic (LATA), a novel approach that assigns layer-specific weights to task vectors based on their alignment with instruction-following or task-specific components. By amplifying task-relevant layers and attenuating instruction-following layers, LATA improves task learning and forgetting performance while preserving overall model utility. Experiments on multiple benchmarks, including WikiText-2, GSM8K, and HumanEval, demonstrate that LATA outperforms existing methods in both multi-task learning and selective task forgetting, achieving higher task accuracy and alignment with minimal degradation in output quality. Our findings highlight the importance of layer-wise analysis in disentangling task-specific and general-purpose knowledge, offering a robust framework for efficient model merging and editing.
Problem

Research questions and friction points this paper is trying to address.

Disentangling task-specific knowledge
Improving multi-task learning performance
Enhancing selective task forgetting
Innovation

Methods, ideas, or system contributions that make the work stand out.

Layer-specific weights task vectors
Amplifies task-relevant layers
Attenuates instruction-following layers
🔎 Similar Papers
No similar papers found.