LPG: Balancing Efficiency and Policy Reasoning in Latent Policy Guardrails

📅 2026-05-17

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

This work addresses the challenge of reconciling high inference accuracy with low-latency requirements in AI safety mechanisms under dynamic policy constraints. The authors propose the Latent Policy Guardrail (LPG) framework, which, for the first time, compresses explicit policy reasoning into a small set of semantically supervised continuous latent states. LPG generates only compact, clause-anchored verdicts, enabling efficient and auditable safety judgments. By grounding decisions in policy provisions and providing interpretable intent explanations, LPG achieves an average safety accuracy of 84.5% and an F1 score of 77.9% across multiple benchmarks using just 10 latent tokens on a 4B-parameter model. This approach yields an approximately 11-fold speedup in inference latency compared to Qwen3-4B-Thinking, demonstrating a significant advance in scalable and transparent AI alignment.

📝 Abstract

Guardrails are a critical safety layer for modern AI systems, but their operating regime is changing. As LLMs are deployed as customized assistants, safety policies are increasingly specified at inference time by users, organizations, or regulatory contexts. This makes safety enforcement fundamentally dynamic: the guardrail should adapt to changing safety policies without retraining. Yet this requirement creates a fundamental tension: faithfully judging complex policy contexts demands reasoning capability, while practical deployment requires low-latency responses. We introduce Latent Policy Guardrail (LPG), a guardrail framework that learnssemantic latent deliberation over dynamic policies. LPG compresses the internal deliberation needed for intent interpretation and policy grounding into continuous states supervised by decision-relevant semantics. At inference time, it generates only a compact verdict anchored to the violated policy clauses, preserving auditability while avoiding the latency of explicit reasoning. Across policy guardrail benchmarks, LPG-4B reaches 84.5% average safety accuracy and 77.9% F1 by compressing deliberation into just 10 latent tokens, outperforming the strongest dynamic baseline while running roughly 11 times faster than Qwen3-4B-Thinking under the single-sample evaluation setup. Code and data are available at https://github.com/SaFo-Lab/Latent_Policy_Guard.

Problem

Research questions and friction points this paper is trying to address.

guardrails

dynamic policies

latency

policy reasoning

safety enforcement

Innovation

Methods, ideas, or system contributions that make the work stand out.

Latent Policy Guardrail

dynamic safety policies

semantic latent deliberation