🤖 AI Summary
This paper addresses the insufficient semantic matching in legal case retrieval by proposing ReaKase-8B, the first framework to jointly model legal entity-relation triples and judicial reasoning processes, thereby constructing context-aware, knowledge-enhanced case representations. Methodologically, it fine-tunes a large language model to integrate multi-granularity knowledge—包括 legal facts, disputed issues, structured relational triples, and reasoning chains—enabling fine-grained case discrimination through deep semantic encoding. Evaluated on the COLIEE 2022/2023 benchmarks, ReaKase-8B significantly outperforms state-of-the-art methods, demonstrating that synergistic modeling of domain knowledge and reasoning substantially improves retrieval accuracy. The implementation is publicly available.
📝 Abstract
Legal case retrieval (LCR) is a cornerstone of real-world legal decision making, as it enables practitioners to identify precedents for a given query case. Existing approaches mainly rely on traditional lexical models and pretrained language models to encode the texts of legal cases. Yet there are rich information in the relations among different legal entities as well as the crucial reasoning process that uncovers how legal facts and legal issues can lead to judicial decisions. Such relational reasoning process reflects the distinctive characteristics of each case that can distinguish one from another, mirroring the real-world judicial process. Naturally, incorporating such information into the precise case embedding could further enhance the accuracy of case retrieval. In this paper, a novel ReaKase-8B framework is proposed to leverage extracted legal facts, legal issues, legal relation triplets and legal reasoning for effective legal case retrieval. ReaKase-8B designs an in-context legal case representation learning paradigm with a fine-tuned large language model. Extensive experiments on two benchmark datasets from COLIEE 2022 and COLIEE 2023 demonstrate that our knowledge and reasoning augmented embeddings substantially improve retrieval performance over baseline models, highlighting the potential of integrating legal reasoning into legal case retrieval systems. The code has been released on https://github.com/yanran-tang/ReaKase-8B.