COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs

📅 2026-01-05
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses a critical gap in the safety evaluation of large language models (LLMs), which has predominantly focused on general risks while neglecting systematic assessment of compliance with organization-specific policies such as allowlists and blocklists. To bridge this gap, we propose COMPASS, the first evaluation framework tailored for organizational policy alignment. COMPASS encompasses 5,920 test queries spanning eight industries, integrating policy-driven query generation, adversarial edge cases, human validation, and multi-model benchmarking. Experiments across seven mainstream LLMs reveal that while models correctly fulfill over 95% of permissible requests, they fail to reject 60%–87% of prohibited adversarial queries, exposing significant vulnerabilities in high-stakes policy enforcement scenarios. This study thus fills a crucial void in enterprise-grade AI safety evaluation.

Technology Category

Application Category

📝 Abstract
As large language models are deployed in high-stakes enterprise applications, from healthcare to finance, ensuring adherence to organization-specific policies has become essential. Yet existing safety evaluations focus exclusively on universal harms. We present COMPASS (Company/Organization Policy Alignment Assessment), the first systematic framework for evaluating whether LLMs comply with organizational allowlist and denylist policies. We apply COMPASS to eight diverse industry scenarios, generating and validating 5,920 queries that test both routine compliance and adversarial robustness through strategically designed edge cases. Evaluating seven state-of-the-art models, we uncover a fundamental asymmetry: models reliably handle legitimate requests (>95% accuracy) but catastrophically fail at enforcing prohibitions, refusing only 13-40% of adversarial denylist violations. These results demonstrate that current LLMs lack the robustness required for policy-critical deployments, establishing COMPASS as an essential evaluation framework for organizational AI safety.
Problem

Research questions and friction points this paper is trying to address.

policy alignment
large language models
organizational safety
compliance evaluation
denylist enforcement
Innovation

Methods, ideas, or system contributions that make the work stand out.

policy alignment
large language models
organizational safety
adversarial robustness
denylist compliance
🔎 Similar Papers
No similar papers found.