Silence is Golden: Leveraging Adversarial Examples to Nullify Audio Control in LDM-based Talking-Head Generation

📅 2025-06-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address privacy and ethical risks arising from unauthorized audio-driven manipulation of speaker head avatars in latent diffusion models (LDMs), this paper proposes the first proactive privacy-preserving framework specifically designed for audio-control disentanglement. Methodologically, we introduce a co-optimized objective comprising an audio-control nullifying loss and an anti-purification loss, enabling iterative generation of robust, irreversible adversarial perturbations in the latent space. These perturbations effectively disrupt the audio-to-face-animation mapping while resisting diffusion-based purification attacks. Extensive experiments on mainstream LDM-based talking-head models—including AnimateAnyone and SadTalker—demonstrate audio-control nullification rates exceeding 98%. Crucially, the perturbations retain their protective efficacy even after multiple rounds of diffusion-based purification. This framework significantly enhances both controllability and trustworthiness of generated content, establishing a new benchmark for privacy-aware audio-driven avatar synthesis.

Technology Category

Application Category

📝 Abstract
Advances in talking-head animation based on Latent Diffusion Models (LDM) enable the creation of highly realistic, synchronized videos. These fabricated videos are indistinguishable from real ones, increasing the risk of potential misuse for scams, political manipulation, and misinformation. Hence, addressing these ethical concerns has become a pressing issue in AI security. Recent proactive defense studies focused on countering LDM-based models by adding perturbations to portraits. However, these methods are ineffective at protecting reference portraits from advanced image-to-video animation. The limitations are twofold: 1) they fail to prevent images from being manipulated by audio signals, and 2) diffusion-based purification techniques can effectively eliminate protective perturbations. To address these challenges, we propose Silencer, a two-stage method designed to proactively protect the privacy of portraits. First, a nullifying loss is proposed to ignore audio control in talking-head generation. Second, we apply anti-purification loss in LDM to optimize the inverted latent feature to generate robust perturbations. Extensive experiments demonstrate the effectiveness of Silencer in proactively protecting portrait privacy. We hope this work will raise awareness among the AI security community regarding critical ethical issues related to talking-head generation techniques. Code: https://github.com/yuangan/Silencer.
Problem

Research questions and friction points this paper is trying to address.

Prevent audio-controlled manipulation in talking-head videos
Protect portraits from diffusion-based purification attacks
Address ethical risks of realistic AI-generated videos
Innovation

Methods, ideas, or system contributions that make the work stand out.

Nullify audio control via adversarial examples
Apply anti-purification loss in LDM
Generate robust perturbations for privacy
🔎 Similar Papers
No similar papers found.
Yuan Gan
Yuan Gan
Zhejiang University
Computer VisionTalking Head GenerationAffective Computing
Jiaxu Miao
Jiaxu Miao
Sun Yat-Sen University
Deep LearningVideo SegmentationFederated Learning
Y
Yunze Wang
Department of Statistics, University of Wisconsin–Madison
Y
Yi Yang
ReLER, CCAI, Zhejiang University