GuardReasoner-Omni: A Reasoning-based Multi-modal Guardrail for Text, Image, and Video

📅 2026-02-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of safety moderation for multimodal content—encompassing text, images, and video—by proposing a deep reasoning–based multimodal guardrail model. The approach employs a two-stage training paradigm: first, supervised fine-tuning (SFT) endows the model with structured reasoning capabilities, followed by error-driven reinforcement learning (RL) to refine its decision-making process. Notably, this is the first multimodal guardrail framework to incorporate an explicit reasoning mechanism, significantly enhancing its ability to discriminate challenging harmful samples. Evaluated across multiple safety moderation benchmarks, the proposed 2B and 4B models consistently outperform existing methods, with the 2B variant achieving a 5.3% higher F1 score than the next-best model.

Technology Category

Application Category

📝 Abstract
We present GuardReasoner-Omni, a reasoning-based guardrail model designed to moderate text, image, and video data. First, we construct a comprehensive training corpus comprising 148k samples spanning these three modalities. Our training pipeline follows a two-stage paradigm to incentivize the model to deliberate before making decisions: (1) conducting SFT to cold-start the model with explicit reasoning capabilities and structural adherence; and (2) performing RL, incorporating an error-driven exploration reward to incentivize deeper reasoning on hard samples. We release a suite of models scaled at 2B and 4B parameters. Extensive experiments demonstrate that GuardReasoner-Omni achieves superior performance compared to existing state-of-the-art baselines across various guardrail benchmarks. Notably, GuardReasoner-Omni (2B) significantly surpasses the runner-up by 5.3% F1 score.
Problem

Research questions and friction points this paper is trying to address.

multi-modal guardrail
content moderation
text-image-video safety
harmful content detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

reasoning-based guardrail
multi-modal moderation
two-stage training
error-driven reinforcement learning
cross-modal safety
🔎 Similar Papers
No similar papers found.