🤖 AI Summary
Existing chain-of-thought (CoT) methods exhibit limited generalization for logic-intensive domain-specific reasoning tasks—such as legal reasoning—that require deep, structured domain knowledge; meanwhile, Monte Carlo Tree Search (MCTS) lacks adaptability to professional reasoning contexts. Method: This paper pioneers the integration of MCTS into domain-specific reasoning via a step-level supervision framework: (i) a knowledge-guided MCTS search space constraint mechanism that explicitly aligns domain rules (e.g., statutory provisions, legal elements) with reasoning steps; and (ii) a learnable reflection-path preference model that enhances self-monitoring and correction of erroneous reasoning trajectories. Contribution/Results: Our approach achieves significant improvements over state-of-the-art baselines across multiple legal reasoning benchmarks. Further analysis reveals a strong positive correlation between fine-grained domain knowledge representation—such as statutory hierarchy and precise legal-element decomposition—and reasoning accuracy, establishing a novel paradigm for expert-level AI reasoning.
📝 Abstract
Recently, stepwise supervision on Chain of Thoughts (CoTs) presents an enhancement on the logical reasoning tasks such as coding and math, with the help of Monte Carlo Tree Search (MCTS). However, its contribution to tasks requiring domain-specific expertise and knowledge remains unexplored. Motivated by the interest, we identify several potential challenges of vanilla MCTS within this context, and propose the framework of Stepwise Domain Knowledge-Driven Reasoning Optimization, employing the MCTS algorithm to develop step-level supervision for problems that require essential comprehension, reasoning, and specialized knowledge. Additionally, we also introduce the Preference Optimization towards Reflection Paths, which iteratively learns self-reflection on the reasoning thoughts from better perspectives. We have conducted extensive experiments to evaluate the advantage of the methodologies. Empirical results demonstrate the effectiveness on various legal-domain problems. We also report a diverse set of valuable findings, hoping to encourage the enthusiasm to the research of domain-specific LLMs and MCTS.