SafeCoT: Improving VLM Safety with Minimal Reasoning

📅 2025-06-10

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

To address the insufficient safety response capability of vision-language models (VLMs) in high-risk or ambiguous scenarios, this paper proposes a lightweight, interpretable, rule-guided chain-of-thought (CoT) supervision framework. Our method introduces the novel paradigm of “minimalist rule-based CoT supervision,” eliminating the need for large-scale safety annotations or complex modeling. It integrates rule-driven chain reasoning supervision, context-aware refusal mechanisms, and lightweight safety fine-tuning to jointly enhance risk detection accuracy and refusal reasonableness. Extensive evaluation across multiple benchmarks demonstrates significant improvements: average over-refusal rate reduced by 32.7%, and safe refusal accuracy increased by 28.4%. Crucially, the framework achieves strong cross-scenario generalization and deployment scalability using only minimal training data. This work provides an efficient, transparent, and practical pathway for safety alignment of VLMs.

Technology Category

Application Category

📝 Abstract

Ensuring safe and appropriate responses from vision-language models (VLMs) remains a critical challenge, particularly in high-risk or ambiguous scenarios. We introduce SafeCoT, a lightweight, interpretable framework that leverages rule-based chain-of-thought (CoT) supervision to improve refusal behavior in VLMs. Unlike prior methods that rely on large-scale safety annotations or complex modeling, SafeCoT uses minimal supervision to help models reason about safety risks and make context-aware refusals. Experiments across multiple benchmarks show that SafeCoT significantly reduces overrefusal and enhances generalization, even with limited training data. Our approach offers a scalable solution for aligning VLMs with safety-critical objectives.

Problem

Research questions and friction points this paper is trying to address.

Improving safety in vision-language models (VLMs)

Reducing overrefusal with minimal supervision

Enhancing generalization in high-risk scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight rule-based CoT supervision

Minimal safety annotation needed

Improves refusal behavior in VLMs

🔎 Similar Papers

No similar papers found.