Robust and Efficient Guardrails with Latent Reasoning

📅 2026-05-27

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

This work addresses the high latency and excessive token consumption of existing safety guardrails in large language models during inference, which hinder their deployment in high-throughput scenarios. The authors propose COLAGUARD, the first approach to compress explicit multi-step safety reasoning into continuous representations in latent space. By leveraging staged curriculum training and a direct hidden-state propagation mechanism, COLAGUARD enables efficient implicit safety judgment without compromising performance. Evaluated across 10 assessments on 8 safety benchmarks, COLAGUARD achieves an 8.24-point improvement in macro-F1 over Llama Guard 3, while accelerating inference by 12.9× and reducing token consumption by 22.4×, thereby overcoming the longstanding trade-off between safety and efficiency.

📝 Abstract

Maintaining the safety of large language models (LLMs) is crucial as they are increasingly deployed in real-world applications. Existing safety guardrails typically rely on single-pass classification or, more recently, distilled reasoning. Reasoning-based guardrails significantly outperform classification-only baselines, but they incur substantial query latency and token overhead that make them impractical for highthroughput deployment. To address this challenge, we propose COLAGUARD, a guardrail model that transfers multi-step safety reasoning into a continuous latent space through a stage-wise training curriculum, enabling direct hidden-state propagation at inference. Evaluated on ten prompt- and response-moderation settings spanning eight safety benchmarks, COLAGUARD improves macro-F1 by 8.24 points over Llama Guard 3 and matches our explicit reasoning baseline, GuardReasoner, in macroF1 while delivering a 12.9X speedup and 22.4X reduction in token usage. Our results suggest that latent reasoning offers a practical alternative to explicit rationale generation for deployable guardrails, jointly improving safety robustness and inference efficiency rather than treating them as competing objectives.

Problem

Research questions and friction points this paper is trying to address.

safety guardrails

large language models

inference efficiency

latency

token overhead

Innovation

Methods, ideas, or system contributions that make the work stand out.

latent reasoning

safety guardrails

efficient inference