🤖 AI Summary
Existing EHPS estimation models overlook security vulnerabilities, while mainstream adversarial attacks either require white-box access or introduce perceptible perturbations—failing to reflect realistic threats. To address this, we propose the first imperceptible black-box attack framework specifically designed for EHPS models: it operates without knowledge of model architecture or gradients, relying solely on output queries. Our method models noise in the latent space and employs output feedback to guide directional optimization, iteratively searching for minimal perturbations. The framework achieves both high attack efficacy and visual imperceptibility. On state-of-the-art EHPS models, it increases pose estimation error by 17.27%–58.21% on average, thereby revealing, for the first time, substantive security risks in digital human generation systems under practical black-box and imperceptible conditions.
📝 Abstract
Expressive human pose and shape (EHPS) estimation is vital for digital human generation, particularly in live-streaming applications. However, most existing EHPS models focus primarily on minimizing estimation errors, with limited attention on potential security vulnerabilities. Current adversarial attacks on EHPS models often require white-box access (e.g., model details or gradients) or generate visually conspicuous perturbations, limiting their practicality and ability to expose real-world security threats. To address these limitations, we propose a novel Unnoticeable Black-Box Attack (UBA) against EHPS models. UBA leverages the latent-space representations of natural images to generate an optimal adversarial noise pattern and iteratively refine its attack potency along an optimized direction in digital space. Crucially, this process relies solely on querying the model's output, requiring no internal knowledge of the EHPS architecture, while guiding the noise optimization toward greater stealth and effectiveness. Extensive experiments and visual analyses demonstrate the superiority of UBA. Notably, UBA increases the pose estimation errors of EHPS models by 17.27%-58.21% on average, revealing critical vulnerabilities. These findings underscore the urgent need to address and mitigate security risks associated with digital human generation systems.