🤖 AI Summary
This work addresses the lack of transparency, auditability, and controllable reasoning in high-stakes decision-making with tabular data by proposing a role-structured multi-agent debate framework. Inspired by juvenile court proceedings, the framework orchestrates collaborative reasoning among three distinct roles—prosecutor, defender, and judge—through seven rounds of structured debate. Built upon large language models, it integrates role-specific definitions, private reasoning strategies, and a structured interaction protocol, while incorporating non-deployment constraints to ensure ethical compliance. Evaluated on the NLSY97 juvenile recidivism prediction task, the approach demonstrates superior stability and generalization in accuracy and F1 score compared to single-agent baselines, and produces comprehensive, traceable decision logs that enable fine-grained behavioral analysis and effective human oversight.
📝 Abstract
We introduce AgenticSimLaw, a role-structured, multi-agent debate framework that provides transparent and controllable test-time reasoning for high-stakes tabular decision-making tasks. Unlike black-box approaches, our courtroom-style orchestration explicitly defines agent roles (prosecutor, defense, judge), interaction protocols (7-turn structured debate), and private reasoning strategies, creating a fully auditable decision-making process. We benchmark this framework on young adult recidivism prediction using the NLSY97 dataset, comparing it against traditional chain-of-thought (CoT) prompting across almost 90 unique combinations of models and strategies. Our results demonstrate that structured multi-agent debate provides more stable and generalizable performance compared to single-agent reasoning, with stronger correlation between accuracy and F1-score metrics. Beyond performance improvements, AgenticSimLaw offers fine-grained control over reasoning steps, generates complete interaction transcripts for explainability, and enables systematic profiling of agent behaviors. While we instantiate this framework in the criminal justice domain to stress-test reasoning under ethical complexity, the approach generalizes to any deliberative, high-stakes decision task requiring transparency and human oversight. This work addresses key LLM-based multi-agent system challenges: organization through structured roles, observability through logged interactions, and responsibility through explicit non-deployment constraints for sensitive domains. Data, results, and code will be available on github.com under the MIT license.