GuardReasoner-Omni: A Reasoning-based Multi-modal Guardrail for Text, Image, and Video

📅 2026-02-03

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

This work addresses the challenge of safety moderation for multimodal content—encompassing text, images, and video—by proposing a deep reasoning–based multimodal guardrail model. The approach employs a two-stage training paradigm: first, supervised fine-tuning (SFT) endows the model with structured reasoning capabilities, followed by error-driven reinforcement learning (RL) to refine its decision-making process. Notably, this is the first multimodal guardrail framework to incorporate an explicit reasoning mechanism, significantly enhancing its ability to discriminate challenging harmful samples. Evaluated across multiple safety moderation benchmarks, the proposed 2B and 4B models consistently outperform existing methods, with the 2B variant achieving a 5.3% higher F1 score than the next-best model.

Technology Category

Application Category

📝 Abstract

We present GuardReasoner-Omni, a reasoning-based guardrail model designed to moderate text, image, and video data. First, we construct a comprehensive training corpus comprising 148k samples spanning these three modalities. Our training pipeline follows a two-stage paradigm to incentivize the model to deliberate before making decisions: (1) conducting SFT to cold-start the model with explicit reasoning capabilities and structural adherence; and (2) performing RL, incorporating an error-driven exploration reward to incentivize deeper reasoning on hard samples. We release a suite of models scaled at 2B and 4B parameters. Extensive experiments demonstrate that GuardReasoner-Omni achieves superior performance compared to existing state-of-the-art baselines across various guardrail benchmarks. Notably, GuardReasoner-Omni (2B) significantly surpasses the runner-up by 5.3% F1 score.

Problem

Research questions and friction points this paper is trying to address.

multi-modal guardrail

content moderation

text-image-video safety

harmful content detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

reasoning-based guardrail

multi-modal moderation

two-stage training

error-driven reinforcement learning