Black-box Adversaries from Latent Space: Unnoticeable Attacks on Human Pose and Shape Estimation

📅 2025-05-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing EHPS estimation models overlook security vulnerabilities, while mainstream adversarial attacks either require white-box access or introduce perceptible perturbations—failing to reflect realistic threats. To address this, we propose the first imperceptible black-box attack framework specifically designed for EHPS models: it operates without knowledge of model architecture or gradients, relying solely on output queries. Our method models noise in the latent space and employs output feedback to guide directional optimization, iteratively searching for minimal perturbations. The framework achieves both high attack efficacy and visual imperceptibility. On state-of-the-art EHPS models, it increases pose estimation error by 17.27%–58.21% on average, thereby revealing, for the first time, substantive security risks in digital human generation systems under practical black-box and imperceptible conditions.

Technology Category

Application Category

📝 Abstract
Expressive human pose and shape (EHPS) estimation is vital for digital human generation, particularly in live-streaming applications. However, most existing EHPS models focus primarily on minimizing estimation errors, with limited attention on potential security vulnerabilities. Current adversarial attacks on EHPS models often require white-box access (e.g., model details or gradients) or generate visually conspicuous perturbations, limiting their practicality and ability to expose real-world security threats. To address these limitations, we propose a novel Unnoticeable Black-Box Attack (UBA) against EHPS models. UBA leverages the latent-space representations of natural images to generate an optimal adversarial noise pattern and iteratively refine its attack potency along an optimized direction in digital space. Crucially, this process relies solely on querying the model's output, requiring no internal knowledge of the EHPS architecture, while guiding the noise optimization toward greater stealth and effectiveness. Extensive experiments and visual analyses demonstrate the superiority of UBA. Notably, UBA increases the pose estimation errors of EHPS models by 17.27%-58.21% on average, revealing critical vulnerabilities. These findings underscore the urgent need to address and mitigate security risks associated with digital human generation systems.
Problem

Research questions and friction points this paper is trying to address.

Exposing security vulnerabilities in EHPS models
Developing black-box attacks without model details
Generating stealthy adversarial noise for pose estimation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages latent-space representations for adversarial noise
Optimizes noise direction for stealth and effectiveness
Operates purely black-box without model internals
🔎 Similar Papers
No similar papers found.
Zhiying Li
Zhiying Li
Jinan University
Computer VisionLow-quality Image AnalysisAI Security
Guanggang Geng
Guanggang Geng
Jinan University
adversarial information retrievalmachine learningstatistical ranking
Yeying Jin
Yeying Jin
Tencent | National University of Singapore
Computer VisionAIGCGenAIMLLMVLM
Z
Zhizhi Guo
Teleai
Bruce Gu
Bruce Gu
Lecturer, Victoria University
Fog/Edge ComputingEdgeAIIoTsSDNPrivacy Preserving
J
Jidong Huo
National Supercomputer Center; Qilu University of Technology
Z
Zhaoxin Fan
Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing, School of Artificial Intelligence, Beihang University; Hangzhou International Innovation Institute; Beihang University
W
Wenjun Wu
Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing, School of Artificial Intelligence, Beihang University; Hangzhou International Innovation Institute; Beihang University