Can LLMs Time Travel? Enhancing Temporal Consistency in Legal Agentic Search through Reinforcement Learning

📅 2026-05-25

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This work addresses a critical limitation in existing legal large language models—their frequent disregard for temporal consistency in legal applicability, which often leads to erroneous reasoning through the misuse of statutes enacted after the model’s training cutoff or mismatched to the relevant time context. To resolve this, we propose LegalSearch-R1, a novel framework that integrates temporal consistency constraints into legal agent search for the first time. It combines RAG-based retrieval of local statutes with online web search and employs end-to-end reinforcement learning to jointly optimize precise statute matching and broad legal knowledge acquisition over temporally annotated data spanning multiple legislative revision cycles. Experiments demonstrate that our 7B-parameter model achieves performance gains of 12.9%–29.8% across 13 legal tasks, outperforms baselines by 57.7%–80.3% on temporal consistency metrics, and exhibits strong generalization capabilities.

📝 Abstract

While large language models (LLMs) augmented with agentic search capabilities show promise for legal reasoning, they overlook a fundamental constraint that applicable law must match the temporal context of each case, as retroactive application of statutes violates core legal principles and leads to erroneous conclusions. Our observations reveal that current legal LLMs suffer from temporal bias anchored to their training cutoff, while search agents rarely incorporate temporal constraints into queries, and that web search alone cannot provide the precise statute and precedent citations that legal reasoning demands. To address these challenges, we propose LegalSearch-R1, an end-to-end reinforcement learning framework that pairs local statute RAG for precise article matching with online web search for broader legal knowledge, trained on temporally-indexed data spanning multiple amendment periods to enforce temporal consistency. Extensive experiments on our benchmark covering 13 legal tasks demonstrate that our 7B-parameter agent outperforms state-of-the-art deep research frameworks and specialized legal LLMs by 12.9% to 29.8%, surpasses baselines by 57.7% to 80.3% on temporal consistency, and exhibits robust out-of-domain generalization. The code and data are available at https://github.com/AlexFanw/LegalSearch-R1.

Problem

Research questions and friction points this paper is trying to address.

temporal consistency

legal reasoning

large language models

statute citation

temporal bias

Innovation

Methods, ideas, or system contributions that make the work stand out.

Temporal Consistency

Legal Reasoning

Reinforcement Learning