Can LLMs Time Travel? Enhancing Temporal Consistency in Legal Agentic Search through Reinforcement Learning

๐Ÿ“… 2026-05-25
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses a critical limitation in existing legal large language modelsโ€”their frequent disregard for temporal consistency in legal applicability, which often leads to erroneous reasoning through the misuse of statutes enacted after the modelโ€™s training cutoff or mismatched to the relevant time context. To resolve this, we propose LegalSearch-R1, a novel framework that integrates temporal consistency constraints into legal agent search for the first time. It combines RAG-based retrieval of local statutes with online web search and employs end-to-end reinforcement learning to jointly optimize precise statute matching and broad legal knowledge acquisition over temporally annotated data spanning multiple legislative revision cycles. Experiments demonstrate that our 7B-parameter model achieves performance gains of 12.9%โ€“29.8% across 13 legal tasks, outperforms baselines by 57.7%โ€“80.3% on temporal consistency metrics, and exhibits strong generalization capabilities.
๐Ÿ“ Abstract
While large language models (LLMs) augmented with agentic search capabilities show promise for legal reasoning, they overlook a fundamental constraint that applicable law must match the temporal context of each case, as retroactive application of statutes violates core legal principles and leads to erroneous conclusions. Our observations reveal that current legal LLMs suffer from temporal bias anchored to their training cutoff, while search agents rarely incorporate temporal constraints into queries, and that web search alone cannot provide the precise statute and precedent citations that legal reasoning demands. To address these challenges, we propose LegalSearch-R1, an end-to-end reinforcement learning framework that pairs local statute RAG for precise article matching with online web search for broader legal knowledge, trained on temporally-indexed data spanning multiple amendment periods to enforce temporal consistency. Extensive experiments on our benchmark covering 13 legal tasks demonstrate that our 7B-parameter agent outperforms state-of-the-art deep research frameworks and specialized legal LLMs by 12.9% to 29.8%, surpasses baselines by 57.7% to 80.3% on temporal consistency, and exhibits robust out-of-domain generalization. The code and data are available at https://github.com/AlexFanw/LegalSearch-R1.
Problem

Research questions and friction points this paper is trying to address.

temporal consistency
legal reasoning
large language models
statute citation
temporal bias
Innovation

Methods, ideas, or system contributions that make the work stand out.

Temporal Consistency
Legal Reasoning
Reinforcement Learning
Retrieval-Augmented Generation (RAG)
Agentic Search
Wei Fan
Wei Fan
Hong Kong University of Science and Technology
Artificial Intelligence
Y
Yining Zhou
Department of Computer Science and Engineering, HKUST, Hong Kong SAR, China
M
Mufan Zhang
Department of Computer Science and Engineering, HKUST, Hong Kong SAR, China
Y
Yanbing Weng
Department of Computer Science and Engineering, HKUST, Hong Kong SAR, China
Y
Yiran Hu
Cheriton School of Computer Science, University of Waterloo, Waterloo, Canada
Tianshi Zheng
Tianshi Zheng
HKUST
Natural Language ProcessingLogical InferenceScientific DiscoveryResearch Agent
Baixuan Xu
Baixuan Xu
Hong Kong University of Science and Technology
Long-horizon AgentMultimodal Understanding
Chunyang Li
Chunyang Li
MPhil in CSE, HKUST
Natural Language Processing
J
Jianhui Yang
School of Law, Tsinghua University, Beijing, China
Haoran Li
Haoran Li
University of Science and Technology of China
3D Generation 3D Editing 3D Understanding
Yangqiu Song
Yangqiu Song
HKUST
Artificial IntelligenceData MiningNatural Language ProcessingKnowledge GraphsCommonsense Reasoning