🤖 AI Summary
To mitigate the misuse risk of single-image-driven pose animation—such as generating illegal content (e.g., politically sensitive or violent videos)—this paper proposes the first adversarial image protection method specifically designed for this task. Our approach injects imperceptible, minimal perturbations that significantly degrade animation quality while preserving the original image’s semantic integrity. Technically, we innovatively model two failure mechanisms: erroneous appearance feature extraction and inter-frame consistency disruption, ensuring robustness against white-box attacks and compatibility with black-box commercial APIs. We conduct comprehensive evaluations across eight state-of-the-art animation models, four benchmark datasets, and six commercial animation APIs. Results demonstrate consistent superiority over six baseline methods: generated animations exhibit perceptible failures—including identity misalignment, structural artifacts, and temporal incoherence—thereby effectively deterring unauthorized or malicious video generation.
📝 Abstract
Pose-driven human image animation has achieved tremendous progress, enabling the generation of vivid and realistic human videos from just one single photo. However, it conversely exacerbates the risk of image misuse, as attackers may use one available image to create videos involving politics, violence and other illegal content. To counter this threat, we propose Dormant, a novel protection approach tailored to defend against pose-driven human image animation techniques. Dormant applies protective perturbation to one human image, preserving the visual similarity to the original but resulting in poor-quality video generation. The protective perturbation is optimized to induce misextraction of appearance features from the image and create incoherence among the generated video frames. Our extensive evaluation across 8 animation methods and 4 datasets demonstrates the superiority of Dormant over 6 baseline protection methods, leading to misaligned identities, visual distortions, noticeable artifacts, and inconsistent frames in the generated videos. Moreover, Dormant shows effectiveness on 6 real-world commercial services, even with fully black-box access.