Preventing Robotic Jailbreaking via Multimodal Domain Adaptation

📅 2025-09-27

📈 Citations: 0

✨ Influential: 0

career value

224K/year

🤖 AI Summary

To address the vulnerability of Vision-Language Models (VLMs) to jailbreak attacks in robotic scenarios and the poor generalizability of existing data-driven defense methods, this paper proposes a lightweight multimodal jailbreak detection framework. The method innovatively fuses textual semantic representations with vision-scene attention features, enabling domain-adaptive detection without extensive annotation by aligning generic jailbreak data with robot-specific data across modalities. It end-to-end integrates multimodal representation learning with attention-guided image-text embedding fusion. Evaluated on real-world robotic applications—including autonomous driving, maritime robotics, and quadruped navigation—the framework achieves near-perfect detection accuracy (>99%) with minimal inference overhead, ensuring strong deployability. Its core contribution is the first incorporation of domain adaptation into VLM jailbreak detection, significantly enhancing generalization and robustness under data-scarce conditions.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) and Vision-Language Models (VLMs) are increasingly deployed in robotic environments but remain vulnerable to jailbreaking attacks that bypass safety mechanisms and drive unsafe or physically harmful behaviors in the real world. Data-driven defenses such as jailbreak classifiers show promise, yet they struggle to generalize in domains where specialized datasets are scarce, limiting their effectiveness in robotics and other safety-critical contexts. To address this gap, we introduce J-DAPT, a lightweight framework for multimodal jailbreak detection through attention-based fusion and domain adaptation. J-DAPT integrates textual and visual embeddings to capture both semantic intent and environmental grounding, while aligning general-purpose jailbreak datasets with domain-specific reference data. Evaluations across autonomous driving, maritime robotics, and quadruped navigation show that J-DAPT boosts detection accuracy to nearly 100% with minimal overhead. These results demonstrate that J-DAPT provides a practical defense for securing VLMs in robotic applications. Additional materials are made available at: https://j-dapt.github.io.

Problem

Research questions and friction points this paper is trying to address.

Detecting multimodal jailbreak attacks on vision-language models in robotics

Improving generalization of safety mechanisms with scarce domain-specific data

Securing robotic systems against harmful behaviors through domain adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal jailbreak detection via attention-based fusion

Domain adaptation aligning general and specific datasets

Integrates textual and visual embeddings for intent analysis

🔎 Similar Papers

No similar papers found.