๐ค AI Summary
This work addresses a critical limitation in existing legal large language modelsโtheir frequent disregard for temporal consistency in legal applicability, which often leads to erroneous reasoning through the misuse of statutes enacted after the modelโs training cutoff or mismatched to the relevant time context. To resolve this, we propose LegalSearch-R1, a novel framework that integrates temporal consistency constraints into legal agent search for the first time. It combines RAG-based retrieval of local statutes with online web search and employs end-to-end reinforcement learning to jointly optimize precise statute matching and broad legal knowledge acquisition over temporally annotated data spanning multiple legislative revision cycles. Experiments demonstrate that our 7B-parameter model achieves performance gains of 12.9%โ29.8% across 13 legal tasks, outperforms baselines by 57.7%โ80.3% on temporal consistency metrics, and exhibits strong generalization capabilities.
๐ Abstract
While large language models (LLMs) augmented with agentic search capabilities show promise for legal reasoning, they overlook a fundamental constraint that applicable law must match the temporal context of each case, as retroactive application of statutes violates core legal principles and leads to erroneous conclusions. Our observations reveal that current legal LLMs suffer from temporal bias anchored to their training cutoff, while search agents rarely incorporate temporal constraints into queries, and that web search alone cannot provide the precise statute and precedent citations that legal reasoning demands. To address these challenges, we propose LegalSearch-R1, an end-to-end reinforcement learning framework that pairs local statute RAG for precise article matching with online web search for broader legal knowledge, trained on temporally-indexed data spanning multiple amendment periods to enforce temporal consistency. Extensive experiments on our benchmark covering 13 legal tasks demonstrate that our 7B-parameter agent outperforms state-of-the-art deep research frameworks and specialized legal LLMs by 12.9% to 29.8%, surpasses baselines by 57.7% to 80.3% on temporal consistency, and exhibits robust out-of-domain generalization. The code and data are available at https://github.com/AlexFanw/LegalSearch-R1.