MoGU V2: Toward a Higher Pareto Frontier Between Model Usability and Security

📅 2025-09-08

📈 Citations: 0

✨ Influential: 0

career value

252K/year

🤖 AI Summary

This work addresses the inherent trade-off between safety and usability in large language models (LLMs). We propose MoGU_v2, a novel framework that embeds a dynamic router within an encoding layer exhibiting high-discriminability safety features. The architecture couples dual branches—security defense and generation optimization—augmented by a bidirectional adaptation mechanism and a hidden-state-aware dynamic weight allocation strategy. Through joint backbone optimization and instruction-tuning with mixed-data training, MoGU_v2 simultaneously enhances both objectives. Extensive evaluation across mainstream, edge-deployable lightweight, and inference-enhanced model families demonstrates that MoGU_v2 significantly improves robustness against adversarial instructions while strictly preserving task response quality—thereby overcoming the limitations of conventional single-objective optimization paradigms.

Technology Category

Application Category

📝 Abstract

As Large Language Models (LLMs) increasingly permeate human life, their security has emerged as a critical concern, particularly their ability to maintain harmless responses to malicious instructions. Although extensive methods have improved LLMs' security, they often lead to conservative, rejection-oriented responses that compromise practical usability. This presents a key challenge: how to advance the Pareto frontier between LLMs' usability and security, rather than necessitate a trade-off between them. To address this, we propose the MoGU framework, in which the intra-layer router dynamically allocates weights by sensing hidden states, thereby balancing the contributions of security-optimized and usability-optimized variants. Despite its initial potential, the MoGU framework faces limitations such as parameter redundancy and performance bottlenecks. To overcome these, we further propose an improved MoGU_v2 framework that establishes a tighter coupling between the routers and hidden states. In MoGU_v2, routers are embedded only in layers encoding highly classifiable security features, and backbone modules are activated during router optimization to enable bidirectional adaptation. MoGU_V2 exhibits strong adaptability and stable improvements across various series of LLMs, including mainstream LLMs serving as brains in various applications, on-device LLMs optimized for resource-constrained scenarios, and reasoning LLMs tailored for user interpretability. Meanwhile, even facing risks introduced by Instruction Fine-tuning, MoGU_v2 can easily restore security without compromising the task performance gains via a simple data-mix strategy. These comprehensive improvements highlight MoGU_V2 as a robust and versatile solution for mitigating security risks in real-world applications.

Problem

Research questions and friction points this paper is trying to address.

Balancing usability and security in large language models

Overcoming conservative rejection-oriented response limitations

Addressing parameter redundancy and performance bottlenecks in security frameworks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Intra-layer router dynamically allocates weights

Routers embedded in layers with security features

Bidirectional adaptation via activated backbone modules

🔎 Similar Papers

Exploring Safety-Utility Trade-Offs in Personalized Language Models