π€ AI Summary
To address the challenges of identifying long-tail cases and achieving high accuracy for semantically ambiguous queries in legal case retrieval, this paper proposes a global case graph modeling method that integrates large language model (LLM)-derived text embeddings with inductive graph learning. The method explicitly encodes case citation relationships into a global graph structureβa novel formulation in legal IR. It introduces a node-degree-regularized contrastive learning objective to guide a graph neural network (GNN) to self-adaptively optimize node representations in a fully unsupervised manner. Crucially, the approach leverages citation topology to enhance semantic consistency without requiring labeled case data. Evaluated on COLIEE 2025 Task 1, it achieves second place overall, demonstrating significant improvements in long-tail case recall and semantic matching accuracy for vague or underspecified queries.
π Abstract
Legal case retrieval plays a pivotal role in the legal domain by facilitating the efficient identification of relevant cases, supporting legal professionals and researchers to propose legal arguments and make informed decision-making. To improve retrieval accuracy, the Competition on Legal Information Extraction and Entailment (COLIEE) is held annually, offering updated benchmark datasets for evaluation. This paper presents a detailed description of CaseLink, the method employed by UQLegalAI, the second highest team in Task 1 of COLIEE 2025. The CaseLink model utilises inductive graph learning and Global Case Graphs to capture the intrinsic case connectivity to improve the accuracy of legal case retrieval. Specifically, a large language model specialized in text embedding is employed to transform legal texts into embeddings, which serve as the feature representations of the nodes in the constructed case graph. A new contrastive objective, incorporating a regularization on the degree of case nodes, is proposed to leverage the information within the case reference relationship for model optimization. The main codebase used in our method is based on an open-sourced repo of CaseLink: https://github.com/yanran-tang/CaseLink.