Leave It to the Experts: Detecting Knowledge Distillation via MoE Expert Signatures

📅 2025-10-19

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

Knowledge distillation (KD) accelerates large-model training but poses risks of intellectual property leakage and model homogenization; existing detection methods—relying on identity or output similarity—are vulnerable to prompt engineering evasion. Method: We propose the first KD detection framework leveraging expert routing patterns in Mixture-of-Experts (MoE) architectures to construct structural-habit fingerprints, enabling efficient identification in both white-box and black-box settings. Innovatively, we model routing behavior as transferable “structural habits” and introduce Shadow-MoE—a technique that generalizes these habits to non-MoE models and black-box APIs. Contribution/Results: By integrating routing analysis, proxy MoE modeling, and multi-dimensional comparison, our method achieves >94% detection accuracy on standard benchmarks—significantly outperforming baselines—while demonstrating strong robustness against prompt-engineering attacks. This validates structural habits as traceable, general-purpose indicators of KD.

Technology Category

Application Category

📝 Abstract

Knowledge Distillation (KD) accelerates training of large language models (LLMs) but poses intellectual property protection and LLM diversity risks. Existing KD detection methods based on self-identity or output similarity can be easily evaded through prompt engineering. We present a KD detection framework effective in both white-box and black-box settings by exploiting an overlooked signal: the transfer of MoE "structural habits", especially internal routing patterns. Our approach analyzes how different experts specialize and collaborate across various inputs, creating distinctive fingerprints that persist through the distillation process. To extend beyond the white-box setup and MoE architectures, we further propose Shadow-MoE, a black-box method that constructs proxy MoE representations via auxiliary distillation to compare these patterns between arbitrary model pairs. We establish a comprehensive, reproducible benchmark that offers diverse distilled checkpoints and an extensible framework to facilitate future research. Extensive experiments demonstrate >94% detection accuracy across various scenarios and strong robustness to prompt-based evasion, outperforming existing baselines while highlighting the structural habits transfer in LLMs.

Problem

Research questions and friction points this paper is trying to address.

Detecting knowledge distillation via MoE expert routing patterns

Addressing intellectual property risks from prompt-evadable distillation

Establishing reproducible benchmark for structural habit transfer detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Detects distillation via MoE expert routing patterns

Uses Shadow-MoE for black-box proxy representations

Analyzes structural habits as persistent distillation fingerprints

🔎 Similar Papers

No similar papers found.