ProDiF: Protecting Domain-Invariant Features to Secure Pre-Trained Models Against Extraction

📅 2025-03-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Pretrained models face extraction attacks wherein adversaries exploit domain-invariant features to infer the source domain and enable unauthorized cross-domain transfer. To mitigate this, we propose a targeted weight-space manipulation method that specifically safeguards domain-invariant features. Our approach introduces a novel bilevel optimization framework: the upper level quantifies filter-level transferability to identify sensitive parameters, while the lower level jointly leverages insecure-memory perturbation and Trusted Execution Environment (TEE)-enforced secure execution to protect model weights. The method preserves model performance for authorized downstream tasks while reducing source-domain identification accuracy to near-random levels and degrading cross-domain transfer capability by 74.65%. This is the first work to synergistically integrate transferability quantification, memory perturbation, and TEE-based protection for copyright enforcement in pretrained models, significantly enhancing robustness against model extraction.

Technology Category

Application Category

📝 Abstract
Pre-trained models are valuable intellectual property, capturing both domain-specific and domain-invariant features within their weight spaces. However, model extraction attacks threaten these assets by enabling unauthorized source-domain inference and facilitating cross-domain transfer via the exploitation of domain-invariant features. In this work, we introduce **ProDiF**, a novel framework that leverages targeted weight space manipulation to secure pre-trained models against extraction attacks. **ProDiF** quantifies the transferability of filters and perturbs the weights of critical filters in unsecured memory, while preserving actual critical weights in a Trusted Execution Environment (TEE) for authorized users. A bi-level optimization further ensures resilience against adaptive fine-tuning attacks. Experimental results show that **ProDiF** reduces source-domain accuracy to near-random levels and decreases cross-domain transferability by 74.65%, providing robust protection for pre-trained models. This work offers comprehensive protection for pre-trained DNN models and highlights the potential of weight space manipulation as a novel approach to model security.
Problem

Research questions and friction points this paper is trying to address.

Protects pre-trained models from extraction attacks
Secures domain-invariant features in weight spaces
Reduces cross-domain transferability and source-domain accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Targeted weight space manipulation for model security
Trusted Execution Environment for critical weight protection
Bi-level optimization against adaptive fine-tuning attacks
🔎 Similar Papers
No similar papers found.
T
Tong Zhou
Northeastern University
Shijin Duan
Shijin Duan
Northeastern University
Quantum MLML Security&PrivacyVSA/HDCFPGA
Gaowen Liu
Gaowen Liu
Cisco Research
machine learningcomputer visionmultimedia.
C
Charles Fleming
Cisco
R
R. Kompella
Cisco
S
Shaolei Ren
UC Riverside
X
Xiaolin Xu
Northeastern University