NeuronTune: Fine-Grained Neuron Modulation for Balanced Safety-Utility Alignment in LLMs

📅 2025-08-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the longstanding trade-off between safety and utility in large language models (LLMs)—where existing methods often suffer from insufficient safety, excessive rejection of benign queries, or degradation in generation quality and general-task performance—this paper proposes NeuronTune. NeuronTune introduces the first fine-grained, cross-layer sparse neuron modulation framework, overcoming the limitations of conventional layer-wise interventions. It identifies safety-critical neurons via attribution analysis and employs a meta-learning mechanism to adaptively calibrate their activation strength, enabling flexible switching between safety-priority and utility-priority regimes. Experiments demonstrate that NeuronTune significantly enhances robustness against adversarial attacks and improves content safety, while preserving text fluency, factual consistency, and performance on standard benchmarks (e.g., MMLU, BBH) with negligible degradation. It consistently outperforms current state-of-the-art approaches across all evaluated dimensions.

Technology Category

Application Category

📝 Abstract
Ensuring robust safety alignment while preserving utility is critical for the reliable deployment of Large Language Models (LLMs). However, current techniques fundamentally suffer from intertwined deficiencies: insufficient robustness against malicious attacks, frequent refusal of benign queries, degradation in generated text quality and general task performance--the former two reflecting deficits in robust safety and the latter constituting utility impairment. We trace these limitations to the coarse-grained layer-wise interventions in existing methods. To resolve this, we propose NeuronTune, a fine-grained framework that dynamically modulates sparse neurons to achieve simultaneous safety-utility optimization. Our approach first identifies safety-critical and utility-preserving neurons across all layers via attribution, then employs meta-learning to adaptively amplify safety-neuron activations and suppress utility-neuron activations. Crucially, NeuronTune enables tunable adjustment of intervention scope via neuron-count thresholds, supporting flexible adaptation to security-critical or utility-priority scenarios. Extensive experimental results demonstrate that our method significantly outperforms existing state-of-the-art technologies, achieving superior model safety while maintaining excellent utility.
Problem

Research questions and friction points this paper is trying to address.

Balancing safety and utility in Large Language Models deployment
Addressing coarse-grained layer-wise intervention limitations in existing methods
Optimizing simultaneous safety-utility via fine-grained neuron modulation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-grained neuron modulation for safety-utility balance
Meta-learning adaptively adjusts neuron activations
Tunable intervention scope via neuron-count thresholds
🔎 Similar Papers
No similar papers found.
B
Birong Pan
School of Computer Science, Wuhan University
Mayi Xu
Mayi Xu
Wuhan University
Natural Language Processing
Q
Qiankun Pi
School of Computer Science, Wuhan University
J
Jianhao Chen
School of Computer Science, Wuhan University
Y
Yuanyuan Zhu
School of Computer Science, Wuhan University
M
Ming Zhong
School of Computer Science, Wuhan University
Tieyun Qian
Tieyun Qian
Wuhan University
natural language processingweb data mining