🤖 AI Summary
Legal sentencing prediction poses a significant challenge due to the need to integrate fine-grained objective knowledge with flexible subjective reasoning. To address this, this work proposes the MSR² framework, which uniquely combines multi-source information retrieval with the reasoning capabilities of large language models and introduces a process-level reward mechanism optimized via reinforcement learning to refine intermediate reasoning steps. Evaluated on two real-world sentencing datasets, the approach substantially improves both prediction accuracy and reasoning interpretability. The core innovation lies in the synergistic design of multi-source retrieval and process-level supervision, establishing a new paradigm for legal AI that achieves both high performance and transparency.
📝 Abstract
Legal judgment prediction (LJP) aims to predict judicial outcomes from case facts and typically includes law article, charge, and sentencing prediction. While recent methods perform well on the first two subtasks, legal sentencing prediction (LSP) remains difficult due to its need for fine-grained objective knowledge and flexible subjective reasoning. To address these limitations, we propose $MSR^2$, a framework that integrates multi-source retrieval and reasoning in LLMs with reinforcement learning. $MSR^2$ enables LLMs to perform multi-source retrieval based on reasoning needs and applies a process-level reward to guide intermediate subjective reasoning steps. Experiments on two real-world datasets show that $MSR^2$ improves both accuracy and interpretability in LSP, providing a promising step toward practical legal AI. Our code is available at https://anonymous.4open.science/r/MSR2-FC3B.