Unified Multi-Task Learning&Model Fusion for Efficient Language Model Guardrailing

πŸ“… 2025-04-27
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address high latency, substantial memory overhead, prohibitive deployment costs, and unstructured outputs in large language model (LLM) input moderation, this paper proposes UniGuardβ€”a lightweight, efficient safety guard system. Methodologically, it introduces a novel task-customized synthetic data generation mechanism, constructs the multitask pre-trained model MultiTaskGuard, and designs a search-based parameter-space fusion framework to jointly optimize diverse safety policies within a single model. Evaluated on seven public datasets and four internally curated guard benchmarks, UniGuard achieves F1 scores 29.92 points higher than Aegis-LlamaGuard and 21.62 points higher than GPT-4o, significantly outperforming existing LLMs and third-party API-based solutions. Key contributions include: (1) a scalable multitask modeling paradigm; (2) a human-annotation-free synthetic data generation strategy; and (3) an end-to-end guard architecture delivering low computational overhead and high output consistency.

Technology Category

Application Category

πŸ“ Abstract
The trend towards large language models (LLMs) for guardrailing against undesired behaviors is increasing and has shown promise for censoring user inputs. However, increased latency, memory consumption, hosting expenses and non-structured outputs can make their use prohibitive. In this work, we show that task-specific data generation can lead to fine-tuned classifiers that significantly outperform current state of the art (SoTA) while being orders of magnitude smaller. Secondly, we show that using a single model, exttt{MultiTaskGuard}, that is pretrained on a large synthetically generated dataset with unique task instructions further improves generalization. Thirdly, our most performant models, exttt{UniGuard}, are found using our proposed search-based model merging approach that finds an optimal set of parameters to combine single-policy models and multi-policy guardrail models. % On 7 public datasets and 4 guardrail benchmarks we created, our efficient guardrail classifiers improve over the best performing SoTA publicly available LLMs and 3$^{ ext{rd}}$ party guardrail APIs in detecting unsafe and safe behaviors by an average F1 score improvement of extbf{29.92} points over Aegis-LlamaGuard and extbf{21.62} over exttt{gpt-4o}, respectively. Lastly, our guardrail synthetic data generation process that uses custom task-specific guardrail poli
Problem

Research questions and friction points this paper is trying to address.

Reducing latency and memory in LLM guardrailing
Improving generalization with multi-task learning
Optimizing model fusion for better performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Task-specific data generation for efficient classifiers
MultiTaskGuard model with synthetic dataset pretraining
Search-based model merging for optimal parameter combination
πŸ”Ž Similar Papers
No similar papers found.
J
James O' Neill
DynamoAI, San Francisco, California, USA
S
Santhosh Subramanian
DynamoAI, San Francisco, California, USA
Eric Lin
Eric Lin
Dynamo AI, Harvard
AI SafetyLLM EvalsGuardrails
Vaikkunth Mugunthan
Vaikkunth Mugunthan
PhD at MIT ; CEO/Co-Founder @ DynamoFL
Federated LearningDifferential PrivacyMachine LearningLLMs