Jailbreaking the Non-Transferable Barrier via Test-Time Data Disguising

📅 2025-03-21

📈 Citations: 1

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This work exposes a critical security vulnerability in black-box non-transferable learning (NTL) models: without accessing model weights or performing fine-tuning, adversarial input perturbations at test time can bypass NTL’s domain isolation mechanism and substantially restore model performance on unauthorized domains. To this end, we propose JailNTL—a novel attack framework featuring a dual-layer camouflage strategy: Data-Intrinsic Camouflage (DID), operating in the input space, and Model-Guided Camouflage (MGD), operating on output distributions. Together, they eliminate domain discrepancies, preserve class-discriminative features, and align black-box output statistics. Experiments demonstrate that JailNTL achieves up to a 55.7% accuracy gain on unauthorized domains using only 1% of authorized samples—significantly outperforming existing white-box attacks. This is the first study to empirically establish a substantive security risk for NTL models under black-box deployment.

Technology Category

Application Category

📝 Abstract

Non-transferable learning (NTL) has been proposed to protect model intellectual property (IP) by creating a"non-transferable barrier"to restrict generalization from authorized to unauthorized domains. Recently, well-designed attack, which restores the unauthorized-domain performance by fine-tuning NTL models on few authorized samples, highlights the security risks of NTL-based applications. However, such attack requires modifying model weights, thus being invalid in the black-box scenario. This raises a critical question: can we trust the security of NTL models deployed as black-box systems? In this work, we reveal the first loophole of black-box NTL models by proposing a novel attack method (dubbed as JailNTL) to jailbreak the non-transferable barrier through test-time data disguising. The main idea of JailNTL is to disguise unauthorized data so it can be identified as authorized by the NTL model, thereby bypassing the non-transferable barrier without modifying the NTL model weights. Specifically, JailNTL encourages unauthorized-domain disguising in two levels, including: (i) data-intrinsic disguising (DID) for eliminating domain discrepancy and preserving class-related content at the input-level, and (ii) model-guided disguising (MGD) for mitigating output-level statistics difference of the NTL model. Empirically, when attacking state-of-the-art (SOTA) NTL models in the black-box scenario, JailNTL achieves an accuracy increase of up to 55.7% in the unauthorized domain by using only 1% authorized samples, largely exceeding existing SOTA white-box attacks.

Problem

Research questions and friction points this paper is trying to address.

Bypass non-transferable barrier in black-box NTL models

Disguise unauthorized data to appear authorized

Improve unauthorized-domain accuracy without model weight changes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Test-time data disguising bypasses non-transferable barrier

Data-intrinsic disguising eliminates domain discrepancy

Model-guided disguising mitigates output-level statistics difference

🔎 Similar Papers

A Transfer Attack to Image Watermarks