CaseGNN++: Graph Contrastive Learning for Legal Case Retrieval with Graph Augmentation

📅 2024-05-20

🏛️ arXiv.org

📈 Citations: 4

✨ Influential: 0

career value

145K/year

🤖 AI Summary

Existing legal case retrieval (LCR) methods largely neglect the graph-structured nature of legal documents and suffer from severe scarcity of labeled data. To address these challenges, we propose EUGAT—a novel edge-feature-driven graph attention layer that explicitly models rich edge semantics in Text-Attributed Case Graphs (TACGs). We further design a contrastive learning objective grounded in graph augmentation—specifically, node and edge perturbations—to leverage unsupervised structural signals and mitigate label scarcity. Additionally, we incorporate contextualized legal knowledge into both node and edge representations. Evaluated on the COLIEE 2022 and 2023 benchmarks, EUGAT significantly outperforms CaseGNN and all existing state-of-the-art models. To foster reproducibility and community advancement, our implementation is publicly released.

Technology Category

Application Category

📝 Abstract

Legal case retrieval (LCR) is a specialised information retrieval task that aims to find relevant cases to a given query case. LCR holds pivotal significance in facilitating legal practitioners in finding precedents. Most of existing LCR methods are based on traditional lexical models and language models, which have gained promising performance in retrieval. However, the domain-specific structural information inherent in legal documents is yet to be exploited to further improve the performance. Our previous work CaseGNN successfully harnesses text-attributed graphs and graph neural networks to address the problem of legal structural information neglect. Nonetheless, there remain two aspects for further investigation: (1) The underutilization of rich edge information within text-attributed case graphs limits CaseGNN to generate informative case representation. (2) The inadequacy of labelled data in legal datasets hinders the training of CaseGNN model. In this paper, CaseGNN++, which is extended from CaseGNN, is proposed to simultaneously leverage the edge information and additional label data to discover the latent potential of LCR models. Specifically, an edge feature-based graph attention layer (EUGAT) is proposed to comprehensively update node and edge features during graph modelling, resulting in a full utilisation of structural information of legal cases. Moreover, a novel graph contrastive learning objective with graph augmentation is developed in CaseGNN++ to provide additional training signals, thereby enhancing the legal comprehension capabilities of CaseGNN++ model. Extensive experiments on two benchmark datasets from COLIEE 2022 and COLIEE 2023 demonstrate that CaseGNN++ not only significantly improves CaseGNN but also achieves supreme performance compared to state-of-the-art LCR methods. Code has been released on https://github.com/yanran-tang/CaseGNN.

Problem

Research questions and friction points this paper is trying to address.

Enhances legal case retrieval with graph contrastive learning

Addresses underutilized edge data and insufficient training signals

Incorporates contextualized LLM embeddings for legal structure understanding

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses edge-updated graph attention for structural information

Incorporates graph contrastive learning with augmentation signals

Leverages LLM-generated contextualized node and edge features

🔎 Similar Papers

Leverage Knowledge Graph and Large Language Model for Law Article Recommendation: A Case Study of Chinese Criminal Law