ILSIC: Corpora for Identifying Indian Legal Statutes from Queries by Laypeople

📅 2026-01-31

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This study addresses the challenge of legal article identification in real-world scenarios, where non-experts often pose informal queries—a setting poorly supported by existing research that predominantly relies on court judgments and struggles to generalize. To bridge this gap, the authors introduce ILSIC, the first corpus comprising over 500 Indian legal provisions aligned with both informal user queries and corresponding court judgments, enabling comparative and cross-domain transfer studies. Through systematic evaluation using zero-shot/few-shot inference, retrieval-augmented generation (RAG), and supervised fine-tuning, they find that models trained solely on judgments suffer significant performance degradation on informal queries, while cross-domain transfer can improve results under specific conditions. Fine-grained analysis further reveals that query type and article frequency critically influence model performance.

Technology Category

Application Category

📝 Abstract

Legal Statute Identification (LSI) for a given situation is one of the most fundamental tasks in Legal NLP. This task has traditionally been modeled using facts from court judgments as input queries, due to their abundance. However, in practical settings, the input queries are likely to be informal and asked by laypersons, or non-professionals. While a few laypeople LSI datasets exist, there has been little research to explore the differences between court and laypeople data for LSI. In this work, we create ILSIC, a corpus of laypeople queries covering 500+ statutes from Indian law. Additionally, the corpus also contains court case judgements to enable researchers to effectively compare between court and laypeople data for LSI. We conducted extensive experiments on our corpus, including benchmarking over the laypeople dataset using zero and few-shot inference, retrieval-augmented generation and supervised fine-tuning. We observe that models trained purely on court judgements are ineffective during test on laypeople queries, while transfer learning from court to laypeople data can be beneficial in certain scenarios. We also conducted fine-grained analyses of our results in terms of categories of queries and frequency of statutes.

Problem

Research questions and friction points this paper is trying to address.

Legal Statute Identification

laypeople queries

Legal NLP

court judgments

corpus

Innovation

Methods, ideas, or system contributions that make the work stand out.

Legal Statute Identification

Laypeople Queries

Corpus Construction