Towards Understanding Multi-Task Learning (Generalization) of LLMs via Detecting and Exploring Task-Specific Neurons

๐Ÿ“… 2024-07-09
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 3
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This study addresses the dual challenges of generalization and catastrophic forgetting in large language model (LLM) multi-task learning. We propose a gradient-attribution-based method to identify task-specific neurons and empirically discoverโ€” for the first timeโ€”that the overlap degree among such neurons strongly correlates with cross-task generalization versus specialization. Furthermore, we reveal that parameter similarity in specific intermediate layers serves as an effective predictor of generalization performance. Leveraging these insights, we introduce a neuron-level continual learning paradigm: only neurons identified as task-relevant are fine-tuned per task. Experiments demonstrate that the localized neurons exhibit high correlation with task performance, and our approach significantly mitigates forgetting, outperforming standard fine-tuning across multiple continual learning benchmarks. These findings provide neuron-level, interpretable foundations for understanding LLM multi-task mechanisms and designing efficient, scalable continual learning strategies.

Technology Category

Application Category

๐Ÿ“ Abstract
While large language models (LLMs) have demonstrated superior multi-task capabilities, understanding the learning mechanisms behind this is still a challenging problem. In this paper, we attempt to understand such mechanisms from the perspective of neurons. Specifically, we detect task-sensitive neurons in LLMs via gradient attribution on task-specific data. Through extensive deactivation and fine-tuning experiments, we demonstrate that the detected neurons are highly correlated with the given task, which we term as task-specific neurons. With these identified task-specific neurons, we delve into two common problems in multi-task learning and continuous learning: Generalization and Catastrophic Forgetting. We find that the overlap of task-specific neurons is strongly associated with generalization and specialization across tasks. Interestingly, at certain layers of LLMs, there is a high similarity in the parameters of different task-specific neurons, and such similarity is highly correlated with the generalization performance. Inspired by these findings, we propose a neuron-level continuous fine-tuning method that only fine-tunes the current task-specific neurons during continuous learning, and extensive experiments demonstrate the effectiveness of the proposed method. Our study provides insights into the interpretability of LLMs in multi-task learning.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Multi-task Learning
Knowledge Transfer
Innovation

Methods, ideas, or system contributions that make the work stand out.

Neuron-level Adaptation
Multi-task Learning
Parameter Efficiency
๐Ÿ”Ž Similar Papers
No similar papers found.