Unmasking Puppeteers: Leveraging Biometric Leakage to Disarm Impersonation in AI-based Videoconferencing

📅 2025-10-03

📈 Citations: 0

✨ Influential: 0

career value

232K/year

🤖 AI Summary

To address the security vulnerability in AI-driven talking-head video conferencing systems—where latent variables are susceptible to adversarial manipulation, enabling real-time identity spoofing—this paper proposes a real-time latent-space identity consistency detection method that operates without reconstructing RGB video. Our approach introduces a pose-conditioned large-margin contrastive encoder to disentangle identity features from expression and pose features directly in the latent space. By leveraging contrastive learning, we extract robust biometric representations and perform lightweight online verification via cosine similarity. Evaluated across multiple state-of-the-art talking-head generation models, our method achieves high detection accuracy, low inference latency, and strong generalization—significantly outperforming existing defense mechanisms. To the best of our knowledge, this is the first solution enabling real-time, rendering-free detection of latent-space puppet attacks.

Technology Category

Application Category

📝 Abstract

AI-based talking-head videoconferencing systems reduce bandwidth by sending a compact pose-expression latent and re-synthesizing RGB at the receiver, but this latent can be puppeteered, letting an attacker hijack a victim's likeness in real time. Because every frame is synthetic, deepfake and synthetic video detectors fail outright. To address this security problem, we exploit a key observation: the pose-expression latent inherently contains biometric information of the driving identity. Therefore, we introduce the first biometric leakage defense without ever looking at the reconstructed RGB video: a pose-conditioned, large-margin contrastive encoder that isolates persistent identity cues inside the transmitted latent while cancelling transient pose and expression. A simple cosine test on this disentangled embedding flags illicit identity swaps as the video is rendered. Our experiments on multiple talking-head generation models show that our method consistently outperforms existing puppeteering defenses, operates in real-time, and shows strong generalization to out-of-distribution scenarios.

Problem

Research questions and friction points this paper is trying to address.

Detecting real-time identity hijacking attacks in AI-based videoconferencing systems

Preventing puppeteering of compact pose-expression latents during transmission

Identifying illicit identity swaps without analyzing reconstructed RGB video frames

Innovation

Methods, ideas, or system contributions that make the work stand out.

Biometric leakage defense without RGB video analysis

Pose-conditioned contrastive encoder isolates identity cues

Real-time cosine test flags illicit identity swaps

🔎 Similar Papers

On the Feasibility of Fully AI-automated Vishing Attacks