GuRE:Generative Query REwriter for Legal Passage Retrieval

📅 2025-05-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Legal paragraph retrieval (LPR) suffers from severe lexical mismatch between queries and paragraphs. To address this, we propose GuRE, a generative query rewriting method that introduces lightweight large language models (LLMs) to legal retrieval for the first time. GuRE employs supervised query rewriting coupled with instruction tuning—achieving retrieval-agnostic performance gains without end-to-end retriever fine-tuning. It leverages a retrieval-augmented evaluation framework and consistently outperforms state-of-the-art baselines across multiple legal retrieval benchmarks, demonstrating strong generalization and practical deployability. Our key contributions are: (1) pioneering the application of LLM-driven generative rewriting to LPR; (2) revealing how distinct training objectives differentially influence retrieval behavior; and (3) releasing an open-source, efficient, plug-and-play query rewriting solution.

Technology Category

Application Category

📝 Abstract
Legal Passage Retrieval (LPR) systems are crucial as they help practitioners save time when drafting legal arguments. However, it remains an underexplored avenue. One primary reason is the significant vocabulary mismatch between the query and the target passage. To address this, we propose a simple yet effective method, the Generative query REwriter (GuRE). We leverage the generative capabilities of Large Language Models (LLMs) by training the LLM for query rewriting."Rewritten queries"help retrievers to retrieve target passages by mitigating vocabulary mismatch. Experimental results show that GuRE significantly improves performance in a retriever-agnostic manner, outperforming all baseline methods. Further analysis reveals that different training objectives lead to distinct retrieval behaviors, making GuRE more suitable than direct retriever fine-tuning for real-world applications. Codes are avaiable at github.com/daehuikim/GuRE.
Problem

Research questions and friction points this paper is trying to address.

Addresses vocabulary mismatch in legal retrieval queries
Improves legal passage retrieval using generative query rewriting
Enhances retriever performance without direct fine-tuning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative query rewriting using LLMs
Mitigates vocabulary mismatch in retrieval
Retriever-agnostic performance improvement
🔎 Similar Papers
No similar papers found.
Daehee Kim
Daehee Kim
NAVER Cloud
Deep LearningVision and LanguageOptical Character RecognitionDomain Generalization
D
Deokhyung Kang
Graduate School of Artificial Intelligence, POSTECH, South Korea
J
Jonghwi Kim
Graduate School of Artificial Intelligence, POSTECH, South Korea
Sangwon Ryu
Sangwon Ryu
POSTECH
Natural Language ProcessingText SummarizationReinforcement LearningLarge Language Models
G
Gary Geunbae Lee
Department of Computer Science and Engineering, POSTECH, South Korea