🤖 AI Summary
This work addresses the lack of customizable, large-scale synthetic dataset generators for attribute-based access control (ABAC) systems. To bridge this gap, we propose MuSimA, a web-based tool that supports multimodal input and uniquely integrates large language models (LLMs) with probabilistic distribution modeling. MuSimA enables users to intuitively specify attribute value distributions either through structured JSON configurations or hand-drawn distribution sketches, automatically generating tailored ABAC datasets that adhere to these specifications. Built using modern web frontend technologies and released as open-source software, MuSimA facilitates on-demand generation of datasets across varying scales and complexities, thereby significantly enhancing the flexibility and efficiency of scalability research in ABAC algorithms.
📝 Abstract
Recent advances in research on Attribute-based Access Control (ABAC) has led to the development of several ingenious methods for representing and enforcing organizational security policies. However, so far little effort has been spent towards building a tool for generating large-scale synthetic datasets that can be used to test the developed ABAC systems. In this paper, we address this shortcoming by building MuSimA - a web-based tool for generating ABAC datasets with user-specified probability distributions of attribute values. It supports multi-modal input, i.e., users can provide specifications either as a structured JSON file or as a combination of a minimal JSON along with hand-drawn distribution sketches. In the latter case, a Large Language Model is used to automatically extract appropriate distribution parameters from the sketches. The generated synthetic ABAC data matching the input specifications can be downloaded by the user. For studying scalability of algorithms and methods related to ABAC, data can be generated for varying sizes and complexities. We make MuSimA freely available for use by the research community.