๐ค AI Summary
Existing LLM safety guardrails suffer from limited real-time performance, inadequate multimodal support, and poor interpretability. To address these gaps, this paper proposes a native multimodal safety guardian system designed for enterprise deployment. The system unifies processing of text, image, and audio inputs, employing category-specific LoRA adapters and a teacher-assisted reasoning-chain annotation pipeline to establish a four-dimensional safety framework: toxicity, gender bias, data privacy, and prompt injection. Leveraging efficient LoRA fine-tuning and a curated multimodal safety dataset, the system delivers real-time, auditable, and interpretable compliance enforcement. Extensive evaluation demonstrates significant improvements over WildGuard, LlamaGuard-4, and GPT-4.1 across multiple safety benchmarks, achieving state-of-the-art performance among both open-source and proprietary modelsโmaking it suitable for highly regulated production environments.
๐ Abstract
The increasing deployment of Large Language Models (LLMs) across enterprise and mission-critical domains has underscored the urgent need for robust guardrailing systems that ensure safety, reliability, and compliance. Existing solutions often struggle with real-time oversight, multi-modal data handling, and explainability -- limitations that hinder their adoption in regulated environments. Existing guardrails largely operate in isolation, focused on text alone making them inadequate for multi-modal, production-scale environments. We introduce Protect, natively multi-modal guardrailing model designed to operate seamlessly across text, image, and audio inputs, designed for enterprise-grade deployment. Protect integrates fine-tuned, category-specific adapters trained via Low-Rank Adaptation (LoRA) on an extensive, multi-modal dataset covering four safety dimensions: toxicity, sexism, data privacy, and prompt injection. Our teacher-assisted annotation pipeline leverages reasoning and explanation traces to generate high-fidelity, context-aware labels across modalities. Experimental results demonstrate state-of-the-art performance across all safety dimensions, surpassing existing open and proprietary models such as WildGuard, LlamaGuard-4, and GPT-4.1. Protect establishes a strong foundation for trustworthy, auditable, and production-ready safety systems capable of operating across text, image, and audio modalities.