🤖 AI Summary
This study addresses the challenge of evaluating how multiple interdependent components jointly affect judicial question-answering performance in legal-domain RAG systems. To this end, we propose LRAGE—the first open-source, multilingual, full-pipeline explainable evaluation framework for legal RAG. LRAGE supports both GUI and CLI interfaces and systematically decouples five core components: retrieval corpus, retrieval algorithm, re-ranker, LLM backbone, and evaluation metrics—enabling cross-jurisdictional benchmarking (Chinese, English, Korean) and component-wise attribution analysis. Leveraging mainstream tools—including Elasticsearch/FAISS for retrieval, BERT-based re-rankers, and Llama/Qwen LLMs—we quantitatively assess each component’s impact on accuracy across three legal benchmarks: LegalBench, LawBench, and KBL. Experiments demonstrate significant improvements in RAG optimization efficiency and deployment reliability for judicial applications. The framework is publicly available under an open-source license.
📝 Abstract
Recently, building retrieval-augmented generation (RAG) systems to enhance the capability of large language models (LLMs) has become a common practice. Especially in the legal domain, previous judicial decisions play a significant role under the doctrine of stare decisis which emphasizes the importance of making decisions based on (retrieved) prior documents. However, the overall performance of RAG system depends on many components: (1) retrieval corpora, (2) retrieval algorithms, (3) rerankers, (4) LLM backbones, and (5) evaluation metrics. Here we propose LRAGE, an open-source tool for holistic evaluation of RAG systems focusing on the legal domain. LRAGE provides GUI and CLI interfaces to facilitate seamless experiments and investigate how changes in the aforementioned five components affect the overall accuracy. We validated LRAGE using multilingual legal benches including Korean (KBL), English (LegalBench), and Chinese (LawBench) by demonstrating how the overall accuracy changes when varying the five components mentioned above. The source code is available at https://github.com/hoorangyee/LRAGE.