Disrupting Vision-Language Model-Driven Navigation Services via Adversarial Object Fusion

πŸ“… 2025-05-29
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the lack of service reliability and QoS guarantees under adversarial attacks in service-oriented vision-language navigation (VLN) systems. We propose AdvOFβ€”the first adversarial object fusion framework tailored to service computing scenarios. AdvOF generates physically realizable 3D adversarial objects via 2D/3D spatial alignment, multi-view weighted co-optimization, and dual regularization on both VLM perceptual features and physical attributes. This enables precise perturbation of the VLM perception module while preserving original task performance with negligible degradation (<1.2% success rate loss). Evaluated across multiple VLN benchmarks, AdvOF reduces navigation success rates by an average of 42.7%, providing the first empirical evidence of service-level security vulnerabilities in VLN. Our framework establishes theoretical foundations and practical methodologies for designing robust, service-composable VLN systems.

Technology Category

Application Category

πŸ“ Abstract
We present Adversarial Object Fusion (AdvOF), a novel attack framework targeting vision-and-language navigation (VLN) agents in service-oriented environments by generating adversarial 3D objects. While foundational models like Large Language Models (LLMs) and Vision Language Models (VLMs) have enhanced service-oriented navigation systems through improved perception and decision-making, their integration introduces vulnerabilities in mission-critical service workflows. Existing adversarial attacks fail to address service computing contexts, where reliability and quality-of-service (QoS) are paramount. We utilize AdvOF to investigate and explore the impact of adversarial environments on the VLM-based perception module of VLN agents. In particular, AdvOF first precisely aggregates and aligns the victim object positions in both 2D and 3D space, defining and rendering adversarial objects. Then, we collaboratively optimize the adversarial object with regularization between the adversarial and victim object across physical properties and VLM perceptions. Through assigning importance weights to varying views, the optimization is processed stably and multi-viewedly by iterative fusions from local updates and justifications. Our extensive evaluations demonstrate AdvOF can effectively degrade agent performance under adversarial conditions while maintaining minimal interference with normal navigation tasks. This work advances the understanding of service security in VLM-powered navigation systems, providing computational foundations for robust service composition in physical-world deployments.
Problem

Research questions and friction points this paper is trying to address.

Targeting VLN agents with adversarial 3D objects
Exploring vulnerabilities in VLM-based perception modules
Degrading agent performance while minimizing navigation interference
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates adversarial 3D objects for attacks
Aligns and optimizes objects in 2D/3D space
Multi-view iterative fusion with importance weights
πŸ”Ž Similar Papers
No similar papers found.
C
Chunlong Xie
College of Computer Science, Chongqing University, Chongqing, China, 400044
J
Jialing He
College of Computer Science, Chongqing University, Chongqing, China, 400044
Shangwei Guo
Shangwei Guo
Chongqing University
AI System SecurityData Privacy
Jiacheng Wang
Jiacheng Wang
Nanyang Technological University
ISACGenAILow-altitude wireless networkSemantic Communications
S
Shudong Zhang
School of Computer Science and Technology, Xidian University, Xi’an, China. 710071
T
Tianwei Zhang
College of Computing and Data Science, Nanyang Technological University, Singapore 639798
T
Tao Xiang
College of Computer Science, Chongqing University, Chongqing, China, 400044