Unilaw-R1: A Large Language Model for Legal Reasoning with Reinforcement Learning and Iterative Inference

📅 2025-10-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address three key challenges in complex legal reasoning—insufficient domain-specific knowledge, unreliable logical consistency, and poor generalization across legal tasks—this paper introduces UniLaw-R1, the first open-source 7B-parameter large language model explicitly optimized for legal reasoning. Methodologically, we propose a two-stage training paradigm: (i) supervised fine-tuning (SFT) on 17K high-quality legal chain-of-thought samples, followed by (ii) reinforcement learning (RL)-based joint optimization, augmented with an iterative reasoning mechanism. We further construct Unilaw-R1-Eval, a dedicated benchmark for rigorous evaluation. Experimental results demonstrate that UniLaw-R1 achieves an average 6.6% improvement over Qwen-2.5-7B-Instruct on LawBench and LexEval, matching the performance of the 32B-parameter DeepSeek-R1-Distill-Qwen-32B (54.9%). The model significantly enhances both accuracy and interpretability in legal reasoning tasks.

Technology Category

Application Category

📝 Abstract
Reasoning-focused large language models (LLMs) are rapidly evolving across various domains, yet their capabilities in handling complex legal problems remains underexplored. In this paper, we introduce Unilaw-R1, a large language model tailored for legal reasoning. With a lightweight 7-billion parameter scale, Unilaw-R1 significantly reduces deployment cost while effectively tackling three core challenges in the legal domain: insufficient legal knowledge, unreliable reasoning logic, and weak business generalization. To address these issues, we first construct Unilaw-R1-Data, a high-quality dataset containing 17K distilled and screened chain-of-thought (CoT) samples. Based on this, we adopt a two-stage training strategy combining Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL), which significantly boosts the performance on complex legal reasoning tasks and supports interpretable decision-making in legal AI applications. To assess legal reasoning ability, we also introduce Unilaw-R1-Eval, a dedicated benchmark designed to evaluate models across single- and multi-choice legal tasks. Unilaw-R1 demonstrates strong results on authoritative benchmarks, outperforming all models of similar scale and achieving performance on par with the much larger DeepSeek-R1-Distill-Qwen-32B (54.9%). Following domain-specific training, it also showed significant gains on LawBench and LexEval, exceeding Qwen-2.5-7B-Instruct (46.6%) by an average margin of 6.6%.
Problem

Research questions and friction points this paper is trying to address.

Addresses insufficient legal knowledge in large language models
Solves unreliable reasoning logic for complex legal problems
Improves weak business generalization in legal AI applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses reinforcement learning for legal reasoning
Employs iterative inference to improve logic
Combines supervised fine-tuning with reinforcement learning
🔎 Similar Papers
No similar papers found.
Hua Cai
Hua Cai
Thomas and Jane Schmidt Rising Star Associate Professor, Purdue University
Shared MobilitySustainable SystemsAI for SustainabilityEnvironmental & Ecological Engineering
S
Shuang Zhao
UniDT
L
Liang Zhang
UniDT
Xuli Shen
Xuli Shen
Fudan University
Computer Vision
Q
Qing Xu
UniDT
W
Weilin Shen
UniDT
Z
Zihao Wen
Fudan University
T
Tianke Ban
Fudan University