Towards Universal Physical Adversarial Attacks via a Joint Multi-Objective and Multi-Model Optimization Framework

📅 2026-05-17
📈 Citations: 0
Influential: 0
📄 PDF

career value

244K/year
🤖 AI Summary
This work addresses the limited transferability of existing physical adversarial attacks, which often overfit to a single surrogate model, while conventional ensemble methods suffer from gradient conflicts within constrained texture spaces, degrading cross-model generalization. To overcome these limitations, the authors propose a Joint Multi-objective Multi-model Optimization Framework (JMOF) that selects an optimal set of surrogate models through quantitative similarity analysis. JMOF employs a dual-level mechanism to jointly suppress output predictions and flatten intermediate feature representations, and introduces an orthogonal gradient alignment strategy to mitigate gradient conflicts. The method achieves, for the first time, a universal physical attack applicable across diverse vision tasks—including object detection, semantic segmentation, and monocular depth estimation—demonstrating significant improvements over state-of-the-art approaches in both simulated and real-world settings, thereby enhancing transferability and robustness against black-box models and offering a new paradigm for evaluating the vulnerability of deployed vision systems.
📝 Abstract
Physical adversarial attacks often overfit single surrogate models and optimization objectives. While ensemble attacks can mitigate this, existing methods struggle with severe gradient conflicts within restricted physical texture spaces, significantly degrading cross-model transferability. To bridge this gap, this paper proposes a Joint Multi-Objective and Multi-Model Optimization Framework (JMOF) that leverages quantitative similarity analysis to select the optimal surrogate model ensemble. Within JMOF, a dual-level mechanism jointly suppresses prediction outputs and flattens intermediate feature distributions, balancing attack efficiency with deep generalization. Additionally, an Orthogonal Gradient Alignment (OGA) strategy resolves cross-model gradient conflicts, transforming mutually repulsive gradients into synergistic optimization directions. Extensive simulated and real-world experiments demonstrate that JMOF outperforms state-of-the-art baselines against diverse black-box detectors. Crucially, JMOF exhibits substantial cross-vision-task generalization, generating attacks capable of simultaneously deceiving object detection and semantic segmentation or monocular depth estimation models. This research advances the generalization limits of physical adversarial attacks, providing a robust framework for evaluating visual AI vulnerabilities in real-world deployments.
Problem

Research questions and friction points this paper is trying to address.

physical adversarial attacks
cross-model transferability
gradient conflicts
multi-model optimization
generalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Physical Adversarial Attacks
Multi-Model Optimization
Gradient Alignment
Cross-Task Generalization
Surrogate Model Ensemble
Ziyang Liu
Ziyang Liu
Research Fellow, Harvard Medical School; PhD, Tsinghua University
AI4BioGraph EmbeddingLarge Language Model
H
Hongyuan Wang
Research Center for Space Optical Engineering, Harbin Institute of Technology, Harbin 150001, China; Zhengzhou Research Institute, Harbin Institute of Technology, Zhengzhou 450000, China
Z
Zijian Wang
Research Center for Space Optical Engineering, Harbin Institute of Technology, Harbin 150001, China
Y
Yinxi Lu
Research Center for Space Optical Engineering, Harbin Institute of Technology, Harbin 150001, China
Y
Yunzhao Zang
Research Center for Space Optical Engineering, Harbin Institute of Technology, Harbin 150001, China
Zhiqiang Yan
Zhiqiang Yan
National University of Singapore
3D computer visiondepth perceptionoccupancy prediction
Q
Qianhao Ning
Research Center for Space Optical Engineering, Harbin Institute of Technology, Harbin 150001, China