MoGU V2: Toward a Higher Pareto Frontier Between Model Usability and Security

📅 2025-09-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the inherent trade-off between safety and usability in large language models (LLMs). We propose MoGU_v2, a novel framework that embeds a dynamic router within an encoding layer exhibiting high-discriminability safety features. The architecture couples dual branches—security defense and generation optimization—augmented by a bidirectional adaptation mechanism and a hidden-state-aware dynamic weight allocation strategy. Through joint backbone optimization and instruction-tuning with mixed-data training, MoGU_v2 simultaneously enhances both objectives. Extensive evaluation across mainstream, edge-deployable lightweight, and inference-enhanced model families demonstrates that MoGU_v2 significantly improves robustness against adversarial instructions while strictly preserving task response quality—thereby overcoming the limitations of conventional single-objective optimization paradigms.

Technology Category

Application Category

📝 Abstract
As Large Language Models (LLMs) increasingly permeate human life, their security has emerged as a critical concern, particularly their ability to maintain harmless responses to malicious instructions. Although extensive methods have improved LLMs' security, they often lead to conservative, rejection-oriented responses that compromise practical usability. This presents a key challenge: how to advance the Pareto frontier between LLMs' usability and security, rather than necessitate a trade-off between them. To address this, we propose the MoGU framework, in which the intra-layer router dynamically allocates weights by sensing hidden states, thereby balancing the contributions of security-optimized and usability-optimized variants. Despite its initial potential, the MoGU framework faces limitations such as parameter redundancy and performance bottlenecks. To overcome these, we further propose an improved MoGU_v2 framework that establishes a tighter coupling between the routers and hidden states. In MoGU_v2, routers are embedded only in layers encoding highly classifiable security features, and backbone modules are activated during router optimization to enable bidirectional adaptation. MoGU_V2 exhibits strong adaptability and stable improvements across various series of LLMs, including mainstream LLMs serving as brains in various applications, on-device LLMs optimized for resource-constrained scenarios, and reasoning LLMs tailored for user interpretability. Meanwhile, even facing risks introduced by Instruction Fine-tuning, MoGU_v2 can easily restore security without compromising the task performance gains via a simple data-mix strategy. These comprehensive improvements highlight MoGU_V2 as a robust and versatile solution for mitigating security risks in real-world applications.
Problem

Research questions and friction points this paper is trying to address.

Balancing usability and security in large language models
Overcoming conservative rejection-oriented response limitations
Addressing parameter redundancy and performance bottlenecks in security frameworks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Intra-layer router dynamically allocates weights
Routers embedded in layers with security features
Bidirectional adaptation via activated backbone modules
🔎 Similar Papers
Yanrui Du
Yanrui Du
Harbin Institute of Technology
LLMsSafetyMedical Domain
F
Fenglei Fan
City University of Hong Kong, Hong Kong
Sendong Zhao
Sendong Zhao
Harbin Institute of Technology
BioNLPLarge Language Model
J
Jiawei Cao
SCIR Lab, Harbin Institute of Technology, China
T
Ting Liu
SCIR Lab, Harbin Institute of Technology, China
Bing Qin
Bing Qin
Professor in Harbin Institute of Technology
Natural Language ProcessingInformation ExtractionSentiment Analysis