🤖 AI Summary
To address two key limitations in judicial-domain RAG—performance degradation caused by long-context inputs and the lack of expert-annotated, multi-task benchmarks—this paper proposes Parameterized Retrieval-Augmented Generation (P-RAG). P-RAG encodes retrieved legal cases as learnable parameter vectors and injects them into the feed-forward networks of large language models via LoRA-based low-rank adaptation, thereby alleviating contextual burden. Concurrently, we introduce a novel multi-task legal evaluation benchmark comprising over 2,000 expert-annotated samples. Experimental results demonstrate that P-RAG maintains or improves downstream task performance while significantly reducing computational overhead. Moreover, it enhances model stability and generalization in complex legal reasoning. This work establishes a lightweight, efficient, and scalable paradigm for retrieval-augmented generation tailored to judicial large language models.
📝 Abstract
Conventional RAG is considered one of the most effective methods for addressing model knowledge insufficiency and hallucination, particularly in the judicial domain that requires high levels of knowledge rigor, logical consistency, and content integrity. However, the conventional RAG method only injects retrieved documents directly into the model's context, which severely constrains models due to their limited context windows and introduces additional computational overhead through excessively long contexts, thereby disrupting models' attention and degrading performance on downstream tasks. Moreover, many existing benchmarks lack expert annotation and focus solely on individual downstream tasks while real-world legal scenarios consist of multiple mixed legal tasks, indicating conventional benchmarks' inadequacy for reflecting models' true capabilities. To address these limitations, we propose PL-CA, which introduces a parametric RAG (P-RAG) framework to perform data augmentation on corpus knowledge and encode this legal knowledge into parametric vectors, and then integrates this parametric knowledge into the LLM's feed-forward networks (FFN) via LoRA, thereby alleviating models' context pressure. Additionally, we also construct a multi-task legal dataset comprising more than 2000 training and test instances, which are all expert-annotated and manually verified. We conduct our experiments on our dataset, and the experimental results demonstrate that our method reduces the overhead associated with excessively long contexts while maintaining competitive performance on downstream tasks compared to conventional RAG. Our code and dataset are provided in the appendix.