🤖 AI Summary
Legal paragraph retrieval (LPR) suffers from severe lexical mismatch between queries and paragraphs. To address this, we propose GuRE, a generative query rewriting method that introduces lightweight large language models (LLMs) to legal retrieval for the first time. GuRE employs supervised query rewriting coupled with instruction tuning—achieving retrieval-agnostic performance gains without end-to-end retriever fine-tuning. It leverages a retrieval-augmented evaluation framework and consistently outperforms state-of-the-art baselines across multiple legal retrieval benchmarks, demonstrating strong generalization and practical deployability. Our key contributions are: (1) pioneering the application of LLM-driven generative rewriting to LPR; (2) revealing how distinct training objectives differentially influence retrieval behavior; and (3) releasing an open-source, efficient, plug-and-play query rewriting solution.
📝 Abstract
Legal Passage Retrieval (LPR) systems are crucial as they help practitioners save time when drafting legal arguments. However, it remains an underexplored avenue. One primary reason is the significant vocabulary mismatch between the query and the target passage. To address this, we propose a simple yet effective method, the Generative query REwriter (GuRE). We leverage the generative capabilities of Large Language Models (LLMs) by training the LLM for query rewriting."Rewritten queries"help retrievers to retrieve target passages by mitigating vocabulary mismatch. Experimental results show that GuRE significantly improves performance in a retriever-agnostic manner, outperforming all baseline methods. Further analysis reveals that different training objectives lead to distinct retrieval behaviors, making GuRE more suitable than direct retriever fine-tuning for real-world applications. Codes are avaiable at github.com/daehuikim/GuRE.