🤖 AI Summary
Existing legal case retrieval (LCR) research is constrained by small-scale, narrow-domain corpora (≤55K cases, covering only a few offenses) and reliance on embedding- or lexical-matching paradigms, resulting in weak semantic representation and low legal relevance. To address these limitations, we introduce LEGAR BENCH—the first large-scale Korean legal case retrieval benchmark—comprising 1.2 million annotated cases across 411 offense categories. We further propose LegalSearchLM, the first LCR model reframing retrieval as a legal-element-driven generative task: it employs structured case representation, legal-element reasoning modeling, and constrained decoding to ensure jurisprudentially aligned content generation. A cross-offense generalization training strategy is also introduced to enhance out-of-domain robustness. On LEGAR BENCH, LegalSearchLM outperforms all baselines by 6–20%, achieving state-of-the-art performance; it further improves generalization to unseen offense categories by 15%.
📝 Abstract
Legal Case Retrieval (LCR), which retrieves relevant cases from a query case, is a fundamental task for legal professionals in research and decision-making. However, existing studies on LCR face two major limitations. First, they are evaluated on relatively small-scale retrieval corpora (e.g., 100-55K cases) and use a narrow range of criminal query types, which cannot sufficiently reflect the complexity of real-world legal retrieval scenarios. Second, their reliance on embedding-based or lexical matching methods often results in limited representations and legally irrelevant matches. To address these issues, we present: (1) LEGAR BENCH, the first large-scale Korean LCR benchmark, covering 411 diverse crime types in queries over 1.2M legal cases; and (2) LegalSearchLM, a retrieval model that performs legal element reasoning over the query case and directly generates content grounded in the target cases through constrained decoding. Experimental results show that LegalSearchLM outperforms baselines by 6-20% on LEGAR BENCH, achieving state-of-the-art performance. It also demonstrates strong generalization to out-of-domain cases, outperforming naive generative models trained on in-domain data by 15%.