🤖 AI Summary
This work addresses the limitations of large language models in the legal domain—specifically, insufficient domain knowledge and weak multi-step judicial reasoning capabilities. The authors propose a three-stage training framework: first, injecting legal knowledge via domain-adaptive sampling guided by perplexity scheduling; second, distilling structured reasoning trajectories through chain-of-thought distillation driven by agent-based workflows; and third, advancing from memorization to autonomous reasoning via curriculum reinforcement learning. A novel Plasticity-Adjusted Sampling strategy is introduced to balance knowledge acquisition with capability retention. The resulting Chinese legal foundation model outperforms larger general-purpose models across multiple benchmarks, demonstrating superior knowledge density and reasoning efficiency. The model weights and the LegalKit evaluation framework are publicly released.
📝 Abstract
While Large Language Models (LLMs) have demonstrated impressive general capabilities, their direct application in the legal domain is often hindered by a lack of precise domain knowledge and complexity of performing rigorous multi-step judicial reasoning. To address this gap, we present LegalOne, a family of foundational models specifically tailored for the Chinese legal domain. LegalOne is developed through a comprehensive three-phase pipeline designed to master legal reasoning. First, during mid-training phase, we propose Plasticity-Adjusted Sampling (PAS) to address the challenge of domain adaptation. This perplexity-based scheduler strikes a balance between the acquisition of new knowledge and the retention of original capabilities, effectively establishing a robust legal foundation. Second, during supervised fine-tuning, we employ Legal Agentic CoT Distillation (LEAD) to distill explicit reasoning from raw legal texts. Unlike naive distillation, LEAD utilizes an agentic workflow to convert complex judicial processes into structured reasoning trajectories, thereby enforcing factual grounding and logical rigor. Finally, we implement a Curriculum Reinforcement Learning (RL) strategy. Through a progressive reinforcement process spanning memorization, understanding, and reasoning, LegalOne evolves from simple pattern matching to autonomous and reliable legal reasoning. Experimental results demonstrate that LegalOne achieves state-of-the-art performance across a wide range of legal tasks, surpassing general-purpose LLMs with vastly larger parameter counts through enhanced knowledge density and efficiency. We publicly release the LegalOne weights and the LegalKit evaluation framework to advance the field of Legal AI, paving the way for deploying trustworthy and interpretable foundation models in high-stakes judicial applications.