SIA: Enhancing Safety via Intent Awareness for Vision-Language Models

📅 2025-07-21

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

Visual language models (VLMs) exhibit latent safety risks when harmless images and texts—individually benign—are jointly processed, triggering harmful responses. Method: We propose a training-free, intent-aware prompt engineering framework featuring a three-stage dynamic reasoning mechanism: visual abstraction → few-shot chain-of-thought (CoT) intent inference → intent-conditioned response optimization. This enables proactive risk identification and mitigation. Technically, the approach integrates image captioning, few-shot CoT prompting, and intent-conditioned response generation. Results: Our method achieves significant safety improvements across multimodal safety benchmarks—including SIUO, MM-SafetyBench, and HoliSafe—outperforming existing approaches in overall safety gain. While accuracy slightly decreases on MMStar, its risk mitigation capability remains superior. Contribution: This work pioneers intent modeling for *proactive* VLM safety defense, eliminating reliance on post-hoc filtering or static refusal mechanisms—a paradigm shift toward intent-driven, pre-response safety control.

Technology Category

Application Category

📝 Abstract

As vision-language models (VLMs) are increasingly deployed in real-world applications, new safety risks arise from the subtle interplay between images and text. In particular, seemingly innocuous inputs can combine to reveal harmful intent, leading to unsafe model responses. Despite increasing attention to multimodal safety, previous approaches based on post hoc filtering or static refusal prompts struggle to detect such latent risks, especially when harmfulness emerges only from the combination of inputs. We propose SIA (Safety via Intent Awareness), a training-free prompt engineering framework that proactively detects and mitigates harmful intent in multimodal inputs. SIA employs a three-stage reasoning process: (1) visual abstraction via captioning, (2) intent inference through few-shot chain-of-thought prompting, and (3) intent-conditioned response refinement. Rather than relying on predefined rules or classifiers, SIA dynamically adapts to the implicit intent inferred from the image-text pair. Through extensive experiments on safety-critical benchmarks including SIUO, MM-SafetyBench, and HoliSafe, we demonstrate that SIA achieves substantial safety improvements, outperforming prior methods. Although SIA shows a minor reduction in general reasoning accuracy on MMStar, the corresponding safety gains highlight the value of intent-aware reasoning in aligning VLMs with human-centric values.

Problem

Research questions and friction points this paper is trying to address.

Detects harmful intent in image-text combinations for VLMs

Proactively mitigates safety risks without predefined rules

Improves model safety via intent-aware reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free prompt engineering framework

Three-stage intent-aware reasoning process

Dynamic adaptation to implicit intent

🔎 Similar Papers

ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time