Can Group Relative Policy Optimization Improve Thai Legal Reasoning and Question Answering?

📅 2025-07-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address weak complex reasoning capabilities and low legal citation accuracy in Thai legal question answering, this paper proposes a novel method integrating Retrieval-Augmented Generation (RAG) with Group Relative Policy Optimization (GRPO). Methodologically, it innovatively employs the lightweight BGE-M3 embedding model as a semantic similarity reward signal—replacing computationally expensive large-language-model-based judges—and reduces computational cost by 2.5×. Additionally, instruction tuning is leveraged to establish a joint learning framework for legal semantic matching and policy coordination. Evaluated on the NitiBench benchmark, the approach achieves a 90% improvement in legal citation F1 score and a 31% gain in the composite quality metric over pure instruction tuning. These results demonstrate substantial enhancements in robustness and practicality for multi-step reasoning and precise legal citation tasks in Thai legal QA.

Technology Category

Application Category

📝 Abstract
The Retrieval-Augmented Generation (RAG) systems' performance on Thai legal question answering is still limited, especially for questions requiring extensive, complex legal reasoning. To address these limitations, we introduce an approach aligning LLMs toward improved law citation accuracy and better response quality using Group-Relative Policy Optimization (GRPO). Our approach leverages BGE-M3 embeddings as a cost-efficient semantic-similarity reward, significantly reducing computational expenses up to 2.5x compared to large language model judges. Experiments on the NitiBench benchmark demonstrate substantial improvements: GRPO achieves up to 90% citation-F1 gains from the base model and a 31% increase in joint quality metrics over instruction tuning. Crucially, our method shows enhanced robustness on complex legal reasoning tasks compared to instruction tuning, providing an effective and resource-efficient solution for enhancing Thai legal LLMs.
Problem

Research questions and friction points this paper is trying to address.

Improving Thai legal question answering with complex reasoning
Enhancing law citation accuracy and response quality
Reducing computational costs in legal LLM optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Group-Relative Policy Optimization (GRPO)
Leverages BGE-M3 embeddings for cost efficiency
Improves citation accuracy and response quality