MASLegalBench: Benchmarking Multi-Agent Systems in Deductive Legal Reasoning

📅 2025-09-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing legal AI benchmarks lack targeted evaluation of multi-agent systems (MAS) on core capabilities such as task decomposition, role specialization, and dynamic collaboration. Method: We introduce the first MAS-specific benchmark for deductive legal reasoning, grounded in the GDPR. It features a role-based agent architecture and multi-level reasoning tasks, implemented via large language models endowed with legally grounded role assignments, integrating manually curated normative scenarios and rigorous logical reasoning challenges. Contribution/Results: Our systematic evaluation of mainstream models reveals critical bottlenecks in task decomposition accuracy, role consistency, and cross-agent logical alignment during collaborative legal reasoning. This work fills a fundamental gap in MAS evaluation for legal AI and establishes a standardized, empirically validated assessment framework to advance the development of interpretable and collaborative legal agents.

Technology Category

Application Category

📝 Abstract
Multi-agent systems (MAS), leveraging the remarkable capabilities of Large Language Models (LLMs), show great potential in addressing complex tasks. In this context, integrating MAS with legal tasks is a crucial step. While previous studies have developed legal benchmarks for LLM agents, none are specifically designed to consider the unique advantages of MAS, such as task decomposition, agent specialization, and flexible training. In fact, the lack of evaluation methods limits the potential of MAS in the legal domain. To address this gap, we propose MASLegalBench, a legal benchmark tailored for MAS and designed with a deductive reasoning approach. Our benchmark uses GDPR as the application scenario, encompassing extensive background knowledge and covering complex reasoning processes that effectively reflect the intricacies of real-world legal situations. Furthermore, we manually design various role-based MAS and conduct extensive experiments using different state-of-the-art LLMs. Our results highlight the strengths, limitations, and potential areas for improvement of existing models and MAS architectures.
Problem

Research questions and friction points this paper is trying to address.

Developing legal benchmarks for multi-agent systems
Addressing lack of evaluation methods in legal domain
Focusing on GDPR scenarios with deductive reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Tailored legal benchmark for multi-agent systems
Uses GDPR scenario with deductive reasoning approach
Manually designs role-based agents for experiments
🔎 Similar Papers
No similar papers found.
H
Huihao Jing
Hong Kong University of Science and Technology
Wenbin Hu
Wenbin Hu
School of Computer, Wuhan University
Artificial IntelligentIntelligent Optimization and SimulationIntelligent Transportation ScienceComplex System and Social N
H
Hongyu Luo
Tsinghua University
J
Jianhui Yang
Tsinghua University
W
Wei Fan
Tsinghua University
H
Haoran Li
Tsinghua University
Yangqiu Song
Yangqiu Song
HKUST
Artificial IntelligenceData MiningNatural Language ProcessingKnowledge GraphsCommonsense Reasoning