Contextual Attention Modulation: Towards Efficient Multi-Task Adaptation in Large Language Models

📅 2025-10-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of balancing knowledge retention and task specialization in large language models (LLMs) during multi-task adaptation, this paper proposes Hybrid Contextual Attention Modulation (HyCAM). HyCAM integrates a lightweight Contextual Attention Modulation (CAM) module and a dynamic routing mechanism into the self-attention layers, enabling adaptive fusion of shared and task-specific representations. This design effectively mitigates catastrophic forgetting while reducing computational overhead. HyCAM supports parameter-efficient, parallel multi-task optimization without introducing task-specific parameters per layer. Evaluated across heterogeneous tasks—including question answering, code generation, and logical reasoning—HyCAM achieves an average performance gain of 3.65% over strong baselines and outperforms existing parameter-efficient fine-tuning methods. The implementation and datasets are publicly released.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) possess remarkable generalization capabilities but struggle with multi-task adaptation, particularly in balancing knowledge retention with task-specific specialization. Conventional fine-tuning methods suffer from catastrophic forgetting and substantial resource consumption, while existing parameter-efficient methods perform suboptimally in complex multi-task scenarios. To address this, we propose Contextual Attention Modulation (CAM), a novel mechanism that dynamically modulates the representations of self-attention modules in LLMs. CAM enhances task-specific features while preserving general knowledge, thereby facilitating more effective and efficient adaptation. For effective multi-task adaptation, CAM is integrated into our Hybrid Contextual Attention Modulation (HyCAM) framework, which combines a shared, full-parameter CAM module with multiple specialized, lightweight CAM modules, enhanced by a dynamic routing strategy for adaptive knowledge fusion. Extensive experiments on heterogeneous tasks, including question answering, code generation, and logical reasoning, demonstrate that our approach significantly outperforms existing approaches, achieving an average performance improvement of 3.65%. The implemented code and data are available to ease reproducibility at https://github.com/Applied-Machine-Learning-Lab/HyCAM.
Problem

Research questions and friction points this paper is trying to address.

Addresses multi-task adaptation challenges in large language models
Mitigates catastrophic forgetting and high resource consumption issues
Enhances task-specific features while preserving general knowledge
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic modulation of self-attention representations in LLMs
Hybrid framework combining shared and specialized CAM modules
Dynamic routing strategy for adaptive knowledge fusion
🔎 Similar Papers
No similar papers found.
D
Dayan Pan
SCSE, Beihang University, ERC of ACAT, MOE, Beijing, China, City University of Hong Kong, Hong Kong, China
Z
Zhaoyang Fu
Huawei Technologies Ltd., Shenzhen, China
J
Jingyuan Wang
SCSE, Beihang University, Key Lab of DIM, MIIT, SEM, Beihang University, Beijing, China
X
Xiao Han
Zhejiang University of Technology, Hangzhou, China
Yue Zhu
Yue Zhu
IBM Research
Performance OptimizationI/OStorageCloud
X
Xiangyu Zhao
City University of Hong Kong, Hong Kong, China