🤖 AI Summary
This study addresses the high manual cost and poor maintainability of Attribute-Based Access Control (ABAC) policies at scale. It presents the first systematic evaluation of large language models’ (LLMs) effectiveness and limitations in automated ABAC policy mining. Methodologically, we develop a Python-based controllable experimental framework that generates multi-scale randomized access logs and empirically assesses the generalization capability of mainstream LLMs—including Gemini (Flash/Pro) and ChatGPT—in ABAC policy induction. Our key findings reveal that LLMs can generate concise and accurate policies in small-scale settings; however, accuracy degrades significantly and policy redundancy surges as the number of subjects or objects increases—exposing fundamental limitations in logical consistency and scalability inherent to current LLMs. This work establishes the first benchmark analysis and practical cautionary insights for leveraging LLMs in access control policy engineering.
📝 Abstract
This paper presents an empirical investigation into the capabilities of Large Language Models (LLMs) to perform automated Attribute-based Access Control (ABAC) policy mining. While ABAC provides fine-grained, context-aware access management, the increasing number and complexity of access policies can make their formulation and evaluation rather challenging. To address the task of synthesizing concise yet accurate policies, we evaluate the performance of some of the state-of-the-art LLMs, specifically Google Gemini (Flash and Pro) and OpenAI ChatGPT, as potential policy mining engines. An experimental framework was developed in Python to generate randomized access data parameterized by varying numbers of subjects, objects, and initial policy sets. The baseline policy sets, which govern permission decisions between subjects and objects, serve as the ground truth for comparison. Each LLM-generated policy was evaluated against the baseline policy using standard performance metrics. The results indicate that LLMs can effectively infer compact and valid ABAC policies for small-scale scenarios. However, as the system size increases, characterized by higher numbers of subjects and objects, LLM outputs exhibit declining accuracy and precision, coupled with significant increase in the size of policy generated, which is beyond the optimal size. These findings highlight both the promise and limitations of current LLM architectures for scalable policy mining in access control domains. Future work will explore hybrid approaches that combine prompt optimization with classical rule mining algorithms to improve scalability and interpretability in complex ABAC environments.