If you're waiting for a sign... that might not be it! Mitigating Trust Boundary Confusion from Visual Injections on Vision-Language Agentic Systems

📅 2026-04-21

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

This work addresses the vulnerability of vision-language agents to malicious visual injections, which can blur trust boundaries and cause deviations from user intent. The study formally defines this problem for the first time and introduces a dual-intent dataset alongside a comprehensive evaluation framework to systematically assess the behavior of seven large vision-language models under both structured and noise-based visual attacks. To mitigate this threat, the authors propose a perception-decision decoupled multi-agent defense mechanism that dynamically evaluates the reliability of visual inputs. This approach significantly reduces response rates to malicious injections across diverse embodied scenarios while preserving accurate responses to legitimate signals, and provides formal robustness guarantees against adversarial perturbations.

Technology Category

Application Category

📝 Abstract

Recent advances in embodied Vision-Language Agentic Systems (VLAS), powered by large vision-language models (LVLMs), enable AI systems to perceive and reason over real-world scenes. Within this context, environmental signals such as traffic lights are essential in-band signals that can and should influence agent behavior. However, similar signals could also be crafted to operate as misleading visual injections, overriding user intent and posing security risks. This duality creates a fundamental challenge: agents must respond to legitimate environmental cues while remaining robust to misleading ones. We refer to this tension as trust boundary confusion. To study this behavior, we design a dual-intent dataset and evaluation framework, through which we show that current LVLM-based agents fail to reliably balance this trade-off, either ignoring useful signals or following harmful ones. We systematically evaluate 7 LVLM agents across multiple embodied settings under both structure-based and noise-based visual injections. To address these vulnerabilities, we propose a multi-agent defense framework that separates perception from decision-making to dynamically assess the reliability of visual inputs. Our approach significantly reduces misleading behaviors while preserving correct responses and provides robustness guarantees under adversarial perturbations. The code of the evaluation framework and artifacts are made available at https://anonymous.4open.science/r/Visual-Prompt-Inject.

Problem

Research questions and friction points this paper is trying to address.

Trust Boundary Confusion

Visual Injections

Vision-Language Agentic Systems

Security Risks

Environmental Signals

Innovation

Methods, ideas, or system contributions that make the work stand out.

trust boundary confusion

visual injection

vision-language agentic systems