Safeguarding Text-to-Image Generative Models Against Unauthorized Knowledge Distillation

📅 2026-05-21

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

This work addresses the growing threat of synthetic images being exploited for unauthorized knowledge distillation and model replication by introducing WaveGuard, a single-forward generative protection framework. Operating within a user-specified perturbation budget, WaveGuard injects structured yet imperceptible frequency-domain perturbations through a frequency-aware mechanism, effectively degrading the utility of protected images as training data for illicit model distillation while rigorously preserving visual fidelity. WaveGuard is the first method to simultaneously achieve high image quality, precise perturbation control, and computational efficiency. Experimental results demonstrate its efficacy in blocking unauthorized distillation under benchmarks such as WikiArt, offering strong protection with minimal perceptual distortion and exact adherence to the prescribed perturbation constraints.

📝 Abstract

Closed-weight generative services are increasingly deployed through query-based APIs, where users can obtain generated outputs while model parameters remain inaccessible. However, such deployment does not prevent model stealing: an attacker can repeatedly query the service, collect large volumes of released synthetic images, and use them as training data for a private substitute model. This query-output-driven process enables unauthorized knowledge distillation and capability replication without direct access to the original weights. To mitigate this threat, a practical defense should preserve the visual fidelity of released images, provide explicit control over perturbation magnitude, and scale efficiently to large-volume output release. We present WaveGuard, a single-pass, generator-based protection framework that safeguards released synthetic images under a user-specified perturbation budget. WaveGuard employs a frequency-aware perturbation generator to inject structured, imperceptible perturbations that maintain perceptual utility for benign viewers while reducing the usefulness of protected images as training data for unauthorized student models. Extensive experiments under WikiArt-related synthetic-output distillation settings show that WaveGuard achieves a favorable efficacy--fidelity--efficiency trade-off, with explicit imperceptibility control and substantial gains in protection efficiency.

Problem

Research questions and friction points this paper is trying to address.

knowledge distillation

model stealing

text-to-image generation

synthetic data

query-based API

Innovation

Methods, ideas, or system contributions that make the work stand out.

knowledge distillation defense

frequency-aware perturbation

generative model protection