Role-Conditioned Refusals: Evaluating Access Control Reasoning in Large Language Models

📅 2025-10-08

📈 Citations: 0

✨ Influential: 0

career value

162K/year

🤖 AI Summary

Large language models (LLMs) frequently generate unauthorized responses under role-based access control (RBAC), exhibiting unreliable judgment of authorization boundaries—especially at fine-grained table- or column-level granularity. Method: We propose the first role-conditioned refusal evaluation framework tailored to fine-grained RBAC policies, built upon an enhanced security benchmark dataset derived from Spider and BIRD. Our approach integrates zero-shot/few-shot prompting, a generator-verifier two-stage architecture, LoRA fine-tuning, and native PostgreSQL role policy enforcement to ensure policy consistency. Contribution/Results: Experiments show that the two-stage verification significantly improves refusal accuracy; LoRA fine-tuning achieves the highest execution accuracy; and policy complexity negatively correlates with model compliance. This work is the first to systematically characterize the trade-off between secure refusal and functional utility, establishing a reproducible evaluation benchmark and actionable technical pathway for trustworthy access control in LLMs.

Technology Category

Application Category

📝 Abstract

Access control is a cornerstone of secure computing, yet large language models often blur role boundaries by producing unrestricted responses. We study role-conditioned refusals, focusing on the LLM's ability to adhere to access control policies by answering when authorized and refusing when not. To evaluate this behavior, we created a novel dataset that extends the Spider and BIRD text-to-SQL datasets, both of which have been modified with realistic PostgreSQL role-based policies at the table and column levels. We compare three designs: (i) zero or few-shot prompting, (ii) a two-step generator-verifier pipeline that checks SQL against policy, and (iii) LoRA fine-tuned models that learn permission awareness directly. Across multiple model families, explicit verification (the two-step framework) improves refusal precision and lowers false permits. At the same time, fine-tuning achieves a stronger balance between safety and utility (i.e., when considering execution accuracy). Longer and more complex policies consistently reduce the reliability of all systems. We release RBAC-augmented datasets and code.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLM adherence to role-based access control policies

Studying refusal mechanisms when unauthorized access occurs

Measuring balance between security compliance and functional utility

Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-step generator-verifier pipeline for SQL verification

LoRA fine-tuned models for permission awareness

Role-based access control dataset extension for evaluation

🔎 Similar Papers

Adaptive Guardrails For Large Language Models via Trust Modeling and In-Context Learning