SafeCoT: Improving VLM Safety with Minimal Reasoning

πŸ“… 2025-06-10
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the insufficient safety response capability of vision-language models (VLMs) in high-risk or ambiguous scenarios, this paper proposes a lightweight, interpretable, rule-guided chain-of-thought (CoT) supervision framework. Our method introduces the novel paradigm of β€œminimalist rule-based CoT supervision,” eliminating the need for large-scale safety annotations or complex modeling. It integrates rule-driven chain reasoning supervision, context-aware refusal mechanisms, and lightweight safety fine-tuning to jointly enhance risk detection accuracy and refusal reasonableness. Extensive evaluation across multiple benchmarks demonstrates significant improvements: average over-refusal rate reduced by 32.7%, and safe refusal accuracy increased by 28.4%. Crucially, the framework achieves strong cross-scenario generalization and deployment scalability using only minimal training data. This work provides an efficient, transparent, and practical pathway for safety alignment of VLMs.

Technology Category

Application Category

πŸ“ Abstract
Ensuring safe and appropriate responses from vision-language models (VLMs) remains a critical challenge, particularly in high-risk or ambiguous scenarios. We introduce SafeCoT, a lightweight, interpretable framework that leverages rule-based chain-of-thought (CoT) supervision to improve refusal behavior in VLMs. Unlike prior methods that rely on large-scale safety annotations or complex modeling, SafeCoT uses minimal supervision to help models reason about safety risks and make context-aware refusals. Experiments across multiple benchmarks show that SafeCoT significantly reduces overrefusal and enhances generalization, even with limited training data. Our approach offers a scalable solution for aligning VLMs with safety-critical objectives.
Problem

Research questions and friction points this paper is trying to address.

Improving safety in vision-language models (VLMs)
Reducing overrefusal with minimal supervision
Enhancing generalization in high-risk scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight rule-based CoT supervision
Minimal safety annotation needed
Improves refusal behavior in VLMs
πŸ”Ž Similar Papers
No similar papers found.