Beyond Guilt: Legal Judgment Prediction with Trichotomous Reasoning

📅 2024-12-19

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

Existing legal large language models (LLMs) lack the three-stage criminal law reasoning capability—constitutive elements, unlawfulness, and culpability—required for accurate judgment prediction, leading to systematic “presumption of guilt” bias and failure to identify acquittals, thereby severely limiting judicial applicability. Method: We introduce LJPIV, the first authoritative benchmark supporting ternary verdict classification (guilty/not guilty/other), rigorously grounded in the tripartite theory of criminal liability. We propose a novel three-tier alignment framework, integrating LLM-based data augmentation, structured three-stage prompting, and domain-adaptive fine-tuning. Contribution/Results: Our approach doubles the F1-score for acquittal identification; the best-performing model achieves an F1 of 0.31 on LJPIV and demonstrates significantly improved cross-domain generalization. This work establishes the first interpretable, structurally grounded modeling of acquittal decisions by legal LLMs.

Technology Category

Application Category

📝 Abstract

In legal practice, judges apply the trichotomous dogmatics of criminal law, sequentially assessing the elements of the offense, unlawfulness, and culpability to determine whether an individual's conduct constitutes a crime. Although current legal large language models (LLMs) show promising accuracy in judgment prediction, they lack trichotomous reasoning capabilities due to the absence of an appropriate benchmark dataset, preventing them from predicting innocent outcomes. As a result, every input is automatically assigned a charge, limiting their practical utility in legal contexts. To bridge this gap, we introduce LJPIV, the first benchmark dataset for Legal Judgment Prediction with Innocent Verdicts. Adhering to the trichotomous dogmatics, we extend three widely-used legal datasets through LLM-based augmentation and manual verification. Our experiments with state-of-the-art legal LLMs and novel strategies that integrate trichotomous reasoning into zero-shot prompting and fine-tuning reveal: (1) current legal LLMs have significant room for improvement, with even the best models achieving an F1 score of less than 0.3 on LJPIV; and (2) our strategies notably enhance both in-domain and cross-domain judgment prediction accuracy, especially for cases resulting in an innocent verdict.

Problem

Research questions and friction points this paper is trying to address.

Lack of trichotomous reasoning in legal LLMs

Absence of benchmark dataset for innocent verdicts

Low accuracy in predicting innocent outcomes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Trichotomous reasoning integration

LLM-based dataset augmentation

Manual verification enhancement

🔎 Similar Papers

LegalDuet: Learning Effective Representations for Legal Judgment Prediction through a Dual-View Legal Clue Reasoning