IL-PCSR: Legal Corpus for Prior Case and Statute Retrieval

📅 2025-10-31

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

In legal retrieval, statutory and precedent retrieval are typically modeled separately, overlooking their semantic and citation interdependencies in judicial practice. To address this, we introduce IL-PCR—the first Indian legal corpus explicitly designed for joint modeling of both tasks—and establish a unified benchmark. Methodologically, we propose a multi-task learning framework integrating lexical matching, semantic encoding, graph neural networks, and an LLM-based cross-task re-ranking module to explicitly capture dependencies between statutes and cases. Experiments demonstrate that our framework significantly outperforms single-task baselines on both tasks, validating the efficacy of collaborative modeling. Notably, the LLM-driven re-ranker improves Recall@10 by +5.2% for statutory retrieval and +4.8% for precedent retrieval—marking the first successful joint optimization of cross-type legal text retrieval.

Technology Category

Application Category

📝 Abstract

Identifying/retrieving relevant statutes and prior cases/precedents for a given legal situation are common tasks exercised by law practitioners. Researchers to date have addressed the two tasks independently, thus developing completely different datasets and models for each task; however, both retrieval tasks are inherently related, e.g., similar cases tend to cite similar statutes (due to similar factual situation). In this paper, we address this gap. We propose IL-PCR (Indian Legal corpus for Prior Case and Statute Retrieval), which is a unique corpus that provides a common testbed for developing models for both the tasks (Statute Retrieval and Precedent Retrieval) that can exploit the dependence between the two. We experiment extensively with several baseline models on the tasks, including lexical models, semantic models and ensemble based on GNNs. Further, to exploit the dependence between the two tasks, we develop an LLM-based re-ranking approach that gives the best performance.

Problem

Research questions and friction points this paper is trying to address.

Develops unified corpus for legal statute and precedent retrieval

Addresses interdependence between similar cases and cited statutes

Proposes LLM-based re-ranking to improve retrieval performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Corpus integrates statute and precedent retrieval tasks

GNN ensemble models combine lexical and semantic approaches

LLM-based re-ranking exploits inter-task dependencies for performance

🔎 Similar Papers

No similar papers found.