Unilaw-R1: A Large Language Model for Legal Reasoning with Reinforcement Learning and Iterative Inference

📅 2025-10-11

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

To address three key challenges in complex legal reasoning—insufficient domain-specific knowledge, unreliable logical consistency, and poor generalization across legal tasks—this paper introduces UniLaw-R1, the first open-source 7B-parameter large language model explicitly optimized for legal reasoning. Methodologically, we propose a two-stage training paradigm: (i) supervised fine-tuning (SFT) on 17K high-quality legal chain-of-thought samples, followed by (ii) reinforcement learning (RL)-based joint optimization, augmented with an iterative reasoning mechanism. We further construct Unilaw-R1-Eval, a dedicated benchmark for rigorous evaluation. Experimental results demonstrate that UniLaw-R1 achieves an average 6.6% improvement over Qwen-2.5-7B-Instruct on LawBench and LexEval, matching the performance of the 32B-parameter DeepSeek-R1-Distill-Qwen-32B (54.9%). The model significantly enhances both accuracy and interpretability in legal reasoning tasks.

Technology Category

Application Category

📝 Abstract

Reasoning-focused large language models (LLMs) are rapidly evolving across various domains, yet their capabilities in handling complex legal problems remains underexplored. In this paper, we introduce Unilaw-R1, a large language model tailored for legal reasoning. With a lightweight 7-billion parameter scale, Unilaw-R1 significantly reduces deployment cost while effectively tackling three core challenges in the legal domain: insufficient legal knowledge, unreliable reasoning logic, and weak business generalization. To address these issues, we first construct Unilaw-R1-Data, a high-quality dataset containing 17K distilled and screened chain-of-thought (CoT) samples. Based on this, we adopt a two-stage training strategy combining Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL), which significantly boosts the performance on complex legal reasoning tasks and supports interpretable decision-making in legal AI applications. To assess legal reasoning ability, we also introduce Unilaw-R1-Eval, a dedicated benchmark designed to evaluate models across single- and multi-choice legal tasks. Unilaw-R1 demonstrates strong results on authoritative benchmarks, outperforming all models of similar scale and achieving performance on par with the much larger DeepSeek-R1-Distill-Qwen-32B (54.9%). Following domain-specific training, it also showed significant gains on LawBench and LexEval, exceeding Qwen-2.5-7B-Instruct (46.6%) by an average margin of 6.6%.

Problem

Research questions and friction points this paper is trying to address.

Addresses insufficient legal knowledge in large language models

Solves unreliable reasoning logic for complex legal problems

Improves weak business generalization in legal AI applications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses reinforcement learning for legal reasoning

Employs iterative inference to improve logic

Combines supervised fine-tuning with reinforcement learning

🔎 Similar Papers

Leveraging Large Language Models for Relevance Judgments in Legal Case Retrieval