🤖 AI Summary
To address the vulnerability of Vision-Language Models (VLMs) to jailbreak attacks in robotic scenarios and the poor generalizability of existing data-driven defense methods, this paper proposes a lightweight multimodal jailbreak detection framework. The method innovatively fuses textual semantic representations with vision-scene attention features, enabling domain-adaptive detection without extensive annotation by aligning generic jailbreak data with robot-specific data across modalities. It end-to-end integrates multimodal representation learning with attention-guided image-text embedding fusion. Evaluated on real-world robotic applications—including autonomous driving, maritime robotics, and quadruped navigation—the framework achieves near-perfect detection accuracy (>99%) with minimal inference overhead, ensuring strong deployability. Its core contribution is the first incorporation of domain adaptation into VLM jailbreak detection, significantly enhancing generalization and robustness under data-scarce conditions.
📝 Abstract
Large Language Models (LLMs) and Vision-Language Models (VLMs) are increasingly deployed in robotic environments but remain vulnerable to jailbreaking attacks that bypass safety mechanisms and drive unsafe or physically harmful behaviors in the real world. Data-driven defenses such as jailbreak classifiers show promise, yet they struggle to generalize in domains where specialized datasets are scarce, limiting their effectiveness in robotics and other safety-critical contexts. To address this gap, we introduce J-DAPT, a lightweight framework for multimodal jailbreak detection through attention-based fusion and domain adaptation. J-DAPT integrates textual and visual embeddings to capture both semantic intent and environmental grounding, while aligning general-purpose jailbreak datasets with domain-specific reference data. Evaluations across autonomous driving, maritime robotics, and quadruped navigation show that J-DAPT boosts detection accuracy to nearly 100% with minimal overhead. These results demonstrate that J-DAPT provides a practical defense for securing VLMs in robotic applications. Additional materials are made available at: https://j-dapt.github.io.