ReaKase-8B: Legal Case Retrieval via Knowledge and Reasoning Representations with LLMs

📅 2025-10-30

📈 Citations: 0

✨ Influential: 0

career value

143K/year

🤖 AI Summary

This paper addresses the insufficient semantic matching in legal case retrieval by proposing ReaKase-8B, the first framework to jointly model legal entity-relation triples and judicial reasoning processes, thereby constructing context-aware, knowledge-enhanced case representations. Methodologically, it fine-tunes a large language model to integrate multi-granularity knowledge—包括 legal facts, disputed issues, structured relational triples, and reasoning chains—enabling fine-grained case discrimination through deep semantic encoding. Evaluated on the COLIEE 2022/2023 benchmarks, ReaKase-8B significantly outperforms state-of-the-art methods, demonstrating that synergistic modeling of domain knowledge and reasoning substantially improves retrieval accuracy. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

Legal case retrieval (LCR) is a cornerstone of real-world legal decision making, as it enables practitioners to identify precedents for a given query case. Existing approaches mainly rely on traditional lexical models and pretrained language models to encode the texts of legal cases. Yet there are rich information in the relations among different legal entities as well as the crucial reasoning process that uncovers how legal facts and legal issues can lead to judicial decisions. Such relational reasoning process reflects the distinctive characteristics of each case that can distinguish one from another, mirroring the real-world judicial process. Naturally, incorporating such information into the precise case embedding could further enhance the accuracy of case retrieval. In this paper, a novel ReaKase-8B framework is proposed to leverage extracted legal facts, legal issues, legal relation triplets and legal reasoning for effective legal case retrieval. ReaKase-8B designs an in-context legal case representation learning paradigm with a fine-tuned large language model. Extensive experiments on two benchmark datasets from COLIEE 2022 and COLIEE 2023 demonstrate that our knowledge and reasoning augmented embeddings substantially improve retrieval performance over baseline models, highlighting the potential of integrating legal reasoning into legal case retrieval systems. The code has been released on https://github.com/yanran-tang/ReaKase-8B.

Problem

Research questions and friction points this paper is trying to address.

Enhancing legal case retrieval accuracy by incorporating legal reasoning processes

Leveraging legal facts, issues, relations and reasoning for precise case embeddings

Improving retrieval performance over traditional lexical and language model approaches

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages legal facts, issues, relations and reasoning

Uses fine-tuned LLM for in-context representation learning

Augments embeddings with legal knowledge to improve retrieval

🔎 Similar Papers

Leveraging Large Language Models for Relevance Judgments in Legal Case Retrieval