BadNAVer: Exploring Jailbreak Attacks On Vision-and-Language Navigation

📅 2025-05-18

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

This work reveals a critical jailbreaking vulnerability in multimodal large language model (MLLM)-driven vision-language navigation (VLN) agents: adversarial prompts can circumvent safety constraints to induce hazardous physical actions. To systematically assess this risk, we propose the first three-tier jailbreaking attack framework tailored for embodied navigation agents, integrating four distinct malicious intent categories and constructing multi-intent adversarial queries via prompt stitching. We conduct unified evaluations across both the Matterport3D simulation environment and a real-world robotic platform, targeting five state-of-the-art MLLM-based navigators. Experimental results demonstrate an average attack success rate exceeding 90%. Crucially, we provide the first empirical validation on physical robots—successfully triggering dangerous behaviors including collisions and boundary violations. These findings expose a previously unrecognized security flaw in MLLM-powered navigation systems: their susceptibility to real-world physical harm through prompt-based exploitation.

Technology Category

Application Category

📝 Abstract

Multimodal large language models (MLLMs) have recently gained attention for their generalization and reasoning capabilities in Vision-and-Language Navigation (VLN) tasks, leading to the rise of MLLM-driven navigators. However, MLLMs are vulnerable to jailbreak attacks, where crafted prompts bypass safety mechanisms and trigger undesired outputs. In embodied scenarios, such vulnerabilities pose greater risks: unlike plain text models that generate toxic content, embodied agents may interpret malicious instructions as executable commands, potentially leading to real-world harm. In this paper, we present the first systematic jailbreak attack paradigm targeting MLLM-driven navigator. We propose a three-tiered attack framework and construct malicious queries across four intent categories, concatenated with standard navigation instructions. In the Matterport3D simulator, we evaluate navigation agents powered by five MLLMs and report an average attack success rate over 90%. To test real-world feasibility, we replicate the attack on a physical robot. Our results show that even well-crafted prompts can induce harmful actions and intents in MLLMs, posing risks beyond toxic output and potentially leading to physical harm.

Problem

Research questions and friction points this paper is trying to address.

Investigating jailbreak attacks on MLLM-driven vision-language navigation systems

Assessing risks of malicious prompts causing real-world harm in embodied agents

Evaluating attack success rates on MLLM navigators in simulated and physical environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Three-tiered attack framework for MLLM navigators

Malicious queries concatenated with navigation instructions

Physical robot test confirms real-world attack feasibility

🔎 Similar Papers

How Secure Are Large Language Models (LLMs) for Navigation in Urban Environments?