🤖 AI Summary
This work addresses the limitations of existing legal judgment prediction methods, which typically rely on static, single-step reasoning and lack both verifiable inference processes and adaptability to judicial evolution. To overcome these challenges, we propose a self-optimizing multi-agent collaborative framework that emulates the role division of a judicial panel. Our approach generates verifiable reasoning chains through a traceable draft–verify–revise workflow and introduces a Hybrid Judicial Memory (HJM) mechanism coupled with a micro-instruction paradigm. This paradigm distills multi-agent interaction trajectories into updatable and transferable micro-instructions, enabling evolutionary learning across cases. Evaluated on the CAIL2018 benchmark, our method achieves state-of-the-art performance and demonstrates strong generalization capabilities on the rigorously time-partitioned CJO2025 dataset.
📝 Abstract
Legal Judgment Prediction (LJP) predicts applicable law articles, charges, and penalty terms from case facts. Beyond accuracy, LJP calls for intrinsically interpretable and legally grounded reasoning that can reconcile statutory rules with precedent-informed standards. However, existing methods often behave as static, one-shot predictors, providing limited procedural support for verifiable reasoning and little capability to adapt as jurisprudential practice evolves. We propose VERDICT, a self-refining collaborative multi-agent framework that simulates a virtual collegial panel. VERDICT assigns specialized agents to complementary roles (e.g., fact structuring, legal retrieval, opinion drafting, and supervisory verification) and coordinates them in a traceable draft--verify--revise workflow with explicit Pass/Reject feedback, producing verifiable reasoning traces and revision rationales. To capture evolving case experience, we further introduce a Hybrid Jurisprudential Memory (HJM) grounded in the Micro-Directive Paradigm, which stores precedent standards and continually distills validated multi-agent verification trajectories into updated Micro-Directives for continual learning across cases. We evaluate VERDICT on CAIL2018 and a newly constructed CJO2025 dataset with a strict future time-split for temporal generalization. VERDICT achieves state-of-the-art performance on CAIL2018 and demonstrates strong generalization on CJO2025. To facilitate reproducibility and further research, we release our code and the dataset at https://anonymous.4open.science/r/ARR-4437.