Exploiting the Experts: Unauthorized Compression in MoE-LLMs

📅 2025-11-22

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

This paper identifies a novel security vulnerability in Mixture-of-Experts (MoE) large language models: the pruneability of expert modules enables adversaries to bypass licensing and safety restrictions by selectively removing experts and performing low-cost fine-tuning on the remaining ones. It presents the first systematic evaluation of authorization vulnerabilities in task-specific MoE deployments, formalizing expert pruning as a dual-use threat. Method: We propose a three-tiered defense: (1) critical expert identification via attribution-based analysis; (2) active-learning-guided realignment after pruning; and (3) expert entanglement training coupled with selective fine-tuning protocols. Contribution/Results: Experiments demonstrate that our approach significantly enhances resilience against unauthorized model compression, achieving superior trade-offs between knowledge retention and functional recovery. It establishes a verifiable, security-aware deployment paradigm for MoE architectures—ensuring controllability without compromising utility.

Technology Category

Application Category

📝 Abstract

Mixture-of-Experts (MoE) architectures are increasingly adopted in large language models (LLMs) for their scalability and efficiency. However, their modular structure introduces a unique vulnerability: adversaries can attempt to compress or repurpose models by pruning experts and cheaply fine-tuning the remainder, effectively bypassing licensing and security constraints. In this paper, we systematically study the prunability of MoE-LLMs under task-specific usage. We first develop an expert attribution framework that identifies the subset of experts most responsible for a given task, then evaluate the performance trade-offs of pruning and re-aligning these experts using active learning-driven fine-tuning. Our findings reveal a critical knowledge loss--recovery trade-off: while certain experts can be isolated to retain task accuracy, significant degradation occurs without targeted re-alignment. Based on this analysis, we propose defense strategies that aim to make MoE models harder to compress and fine-tune without authorization, including entangled expert training and selective fine-tuning protocols that resist unauthorized adaptation. By positioning expert pruning as both a threat vector and a defense target, this work highlights the dual-use nature of MoE modularity and provides the first systematic evaluation framework for secure specialization of MoE-LLMs.

Problem

Research questions and friction points this paper is trying to address.

Unauthorized compression threatens MoE-LLMs by pruning experts

Adversaries bypass licensing via expert pruning and cheap fine-tuning

Defense strategies prevent unauthorized model compression and adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed expert attribution framework for task-specific pruning

Used active learning-driven fine-tuning for expert re-alignment

Proposed entangled expert training as defense strategy

🔎 Similar Papers

Demystifying the Compression of Mixture-of-Experts Through a Unified Framework