🤖 AI Summary
This work addresses the limited performance of small language models on complex legal reasoning tasks, primarily due to the scarcity of high-quality, fine-grained reasoning trajectory data. To overcome this challenge, the authors propose LegalDrill, a novel framework featuring diagnosis-driven data synthesis and self-reflective sample selection. LegalDrill employs fine-grained prompts to extract and iteratively refine reasoning trajectories from a strong teacher model, then leverages self-reflection to automatically identify high-value training samples without human annotation. By integrating supervised fine-tuning with direct preference optimization, the method significantly enhances the legal reasoning capabilities of small models across multiple benchmarks, outperforming baselines that rely on manual labeling or standard sampling strategies.
📝 Abstract
Small language models (SLMs) are promising for real-world deployment due to their efficiency and low operational cost. However, their limited capacity struggles with high-stakes legal reasoning tasks that require coherent statute interpretation and logically consistent deduction. Furthermore, training SLMs for such tasks demands high-quality, concise reasoning trajectories, which are prohibitively expensive to manually collect and difficult to curate via standard rejection sampling, lacking granularity beyond final verdicts. To address these challenges, we propose {LegalDrill}, a diagnosis-driven synthesis framework that extracts and iteratively refines reasoning trajectories from a capable teacher via fine-grained prompting, then a self-reflective verification is employed to adaptively select the most effective data for the SLM student. The resulting data empower SLM training through supervised fine-tuning and direct preference optimization. Extensive experiments on several legal benchmarks demonstrate that {LegalDrill} significantly bolsters the legal reasoning capabilities of representative SLMs while bypassing the need for scarce expert annotations, paving a scalable path toward practical legal reasoning systems.