🤖 AI Summary
Addressing the challenges of high logical complexity and scarcity of high-quality training data in automatic translation of natural language rules to ODRL policies, this paper proposes an LLM-based multi-agent Orchestrator-Workers architecture. The architecture decomposes tasks, enables collaborative generation, and employs iterative rewriting, integrating LoRA fine-tuning, a syntactic validator, and a semantic reflection module to achieve end-to-end, high-accuracy ODRL policy generation. Evaluated on a benchmark comprising 770 real-world data space use cases, our system significantly outperforms existing methods in both syntactic correctness and semantic fidelity. To the best of our knowledge, it is the first approach to enable robust, interpretable, and formally verifiable automated conversion of complex authorization rules into ODRL policies.
📝 Abstract
The Open Digital Rights Language (ODRL) is a pivotal standard for automating data rights management. However, the inherent logical complexity of authorization policies, combined with the scarcity of high-quality "Natural Language-to-ODRL" training datasets, impedes the ability of current methods to efficiently and accurately translate complex rules from natural language into the ODRL format. To address this challenge, this research leverages the potent comprehension and generation capabilities of Large Language Models (LLMs) to achieve both automation and high fidelity in this translation process. We introduce AgentODRL, a multi-agent system based on an Orchestrator-Workers architecture. The architecture consists of specialized Workers, including a Generator for ODRL policy creation, a Decomposer for breaking down complex use cases, and a Rewriter for simplifying nested logical relationships. The Orchestrator agent dynamically coordinates these Workers, assembling an optimal pathway based on the complexity of the input use case. Specifically, we enhance the ODRL Generator by incorporating a validator-based syntax strategy and a semantic reflection mechanism powered by a LoRA-finetuned model, significantly elevating the quality of the generated policies. Extensive experiments were conducted on a newly constructed dataset comprising 770 use cases of varying complexity, all situated within the context of data spaces. The results, evaluated using ODRL syntax and semantic scores, demonstrate that our proposed Orchestrator-Workers system, enhanced with these strategies, achieves superior performance on the ODRL generation task.