Knowledge Graph-Assisted LLM Post-Training for Enhanced Legal Reasoning

📅 2026-01-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited capacity of existing large language models to effectively model structured legal knowledge during post-training, which hinders their ability to perform complex legal reasoning. To overcome this, the authors introduce a novel approach that integrates a 12K-case legal knowledge graph constructed under the IRAC (Issue, Rule, Application, Conclusion) framework into the post-training pipeline. By combining supervised fine-tuning (SFT) with direct preference optimization (DPO), they conduct joint training across models of varying scales (30B, 49B, and 70B parameters). The method achieves state-of-the-art performance on four out of five legal benchmarks (encompassing 14 tasks), with the 70B DPO model excelling in four of six complex reasoning tasks—outperforming even a specialized 141B legal model—and substantially enhancing reasoning capabilities in high-stakes professional legal scenarios.

Technology Category

Application Category

📝 Abstract
LLM post-training has primarily relied on large text corpora and human feedback, without capturing the structure of domain knowledge. This has caused models to struggle dealing with complex reasoning tasks, especially for high-stakes professional domains. In Law, reasoning requires deep understanding of the relations between various legal concepts, a key component missing in current LLM post-training. In this paper, we propose a knowledge graph (KG)-assisted approach for enhancing LLMs'reasoning capability in Legal that is generalizable to other high-stakes domains. We model key legal concepts by following the \textbf{IRAC} (Issue, Rule, Analysis and Conclusion) framework, and construct a KG with 12K legal cases. We then produce training data using our IRAC KG, and conduct both Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) with three state-of-the-art (SOTA) LLMs (30B, 49B and 70B), varying architecture and base model family. Our post-trained models obtained better average performance on 4/5 diverse legal benchmarks (14 tasks) than baselines. In particular, our 70B DPO model achieved the best score on 4/6 reasoning tasks, among baselines and a 141B SOTA legal LLM, demonstrating the effectiveness of our KG for enhancing LLMs'legal reasoning capability.
Problem

Research questions and friction points this paper is trying to address.

Legal Reasoning
Knowledge Graph
LLM Post-Training
Domain Knowledge
Complex Reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Knowledge Graph
Legal Reasoning
IRAC Framework
Direct Preference Optimization
LLM Post-Training