COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs

📅 2026-01-05

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

213K/year

🤖 AI Summary

This work addresses a critical gap in the safety evaluation of large language models (LLMs), which has predominantly focused on general risks while neglecting systematic assessment of compliance with organization-specific policies such as allowlists and blocklists. To bridge this gap, we propose COMPASS, the first evaluation framework tailored for organizational policy alignment. COMPASS encompasses 5,920 test queries spanning eight industries, integrating policy-driven query generation, adversarial edge cases, human validation, and multi-model benchmarking. Experiments across seven mainstream LLMs reveal that while models correctly fulfill over 95% of permissible requests, they fail to reject 60%–87% of prohibited adversarial queries, exposing significant vulnerabilities in high-stakes policy enforcement scenarios. This study thus fills a crucial void in enterprise-grade AI safety evaluation.

Technology Category

Application Category

📝 Abstract

As large language models are deployed in high-stakes enterprise applications, from healthcare to finance, ensuring adherence to organization-specific policies has become essential. Yet existing safety evaluations focus exclusively on universal harms. We present COMPASS (Company/Organization Policy Alignment Assessment), the first systematic framework for evaluating whether LLMs comply with organizational allowlist and denylist policies. We apply COMPASS to eight diverse industry scenarios, generating and validating 5,920 queries that test both routine compliance and adversarial robustness through strategically designed edge cases. Evaluating seven state-of-the-art models, we uncover a fundamental asymmetry: models reliably handle legitimate requests (>95% accuracy) but catastrophically fail at enforcing prohibitions, refusing only 13-40% of adversarial denylist violations. These results demonstrate that current LLMs lack the robustness required for policy-critical deployments, establishing COMPASS as an essential evaluation framework for organizational AI safety.

Problem

Research questions and friction points this paper is trying to address.

policy alignment

large language models

organizational safety

compliance evaluation

denylist enforcement

Innovation

Methods, ideas, or system contributions that make the work stand out.

policy alignment

large language models

organizational safety