🤖 AI Summary
Automated extraction of implicit legal principles from judicial precedents remains challenging due to their abstract, normative, and context-dependent nature. Method: This paper formally defines “Legal Rule Induction” (LRI) as the task of distilling generalizable, transferable doctrinal rules from sets of analogous cases. To support this, we introduce the first large-scale Chinese analogous-case benchmark—comprising 5,121 case groups (38,088 individual judgments) and a gold-standard test set of 216 expert-annotated groups—thereby filling a critical gap in modeling and evaluating principle discovery for legal AI. We propose an LLM-based LRI framework integrating case alignment, conditional-behavior-consequence (CBC) structural distillation, and hallucination-mitigating fine-tuning. Results: Our method achieves significant improvements over baselines across rule generalizability, factual accuracy, and output stability, with an average multi-dimensional gain of 27.4%, effectively curbing overgeneralization and factual hallucination.
📝 Abstract
Legal rules encompass not only codified statutes but also implicit adjudicatory principles derived from precedents that contain discretionary norms, social morality, and policy. While computational legal research has advanced in applying established rules to cases, inducing legal rules from judicial decisions remains understudied, constrained by limitations in model inference efficacy and symbolic reasoning capability. The advent of Large Language Models (LLMs) offers unprecedented opportunities for automating the extraction of such latent principles, yet progress is stymied by the absence of formal task definitions, benchmark datasets, and methodologies. To address this gap, we formalize Legal Rule Induction (LRI) as the task of deriving concise, generalizable doctrinal rules from sets of analogous precedents, distilling their shared preconditions, normative behaviors, and legal consequences. We introduce the first LRI benchmark, comprising 5,121 case sets (38,088 Chinese cases in total) for model tuning and 216 expert-annotated gold test sets. Experimental results reveal that: 1) State-of-the-art LLMs struggle with over-generalization and hallucination; 2) Training on our dataset markedly enhances LLMs capabilities in capturing nuanced rule patterns across similar cases.