🤖 AI Summary
In legal retrieval, statutory and precedent retrieval are typically modeled separately, overlooking their semantic and citation interdependencies in judicial practice. To address this, we introduce IL-PCR—the first Indian legal corpus explicitly designed for joint modeling of both tasks—and establish a unified benchmark. Methodologically, we propose a multi-task learning framework integrating lexical matching, semantic encoding, graph neural networks, and an LLM-based cross-task re-ranking module to explicitly capture dependencies between statutes and cases. Experiments demonstrate that our framework significantly outperforms single-task baselines on both tasks, validating the efficacy of collaborative modeling. Notably, the LLM-driven re-ranker improves Recall@10 by +5.2% for statutory retrieval and +4.8% for precedent retrieval—marking the first successful joint optimization of cross-type legal text retrieval.
📝 Abstract
Identifying/retrieving relevant statutes and prior cases/precedents for a given legal situation are common tasks exercised by law practitioners. Researchers to date have addressed the two tasks independently, thus developing completely different datasets and models for each task; however, both retrieval tasks are inherently related, e.g., similar cases tend to cite similar statutes (due to similar factual situation). In this paper, we address this gap. We propose IL-PCR (Indian Legal corpus for Prior Case and Statute Retrieval), which is a unique corpus that provides a common testbed for developing models for both the tasks (Statute Retrieval and Precedent Retrieval) that can exploit the dependence between the two. We experiment extensively with several baseline models on the tasks, including lexical models, semantic models and ensemble based on GNNs. Further, to exploit the dependence between the two tasks, we develop an LLM-based re-ranking approach that gives the best performance.