DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search

📅 2024-10-04
🏛️ International Conference on Learning Representations
📈 Citations: 9
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) suffer from rigid, static reasoning paths that fail to adapt to problem-specific characteristics and inherent model capability variations. Method: This paper proposes a dynamic reasoning path search framework built upon atomic reasoning actions. It iteratively explores and evaluates reasoning trajectories to automatically synthesize an optimal reasoning sequence per input. The framework supports both external planner fine-tuning and end-to-end LLM fine-tuning, augmented by a dual-path supervised fine-tuning mechanism. Contribution/Results: Its core innovation is the first introduction of a “trajectory-search-driven dynamic reasoning planning” paradigm, enabling difficulty-aware, adaptive allocation of computational resources. Evaluated on eight mainstream reasoning benchmarks, it significantly outperforms static prompting and instruction-tuning baselines, demonstrating strong generalization and effectiveness.

Technology Category

Application Category

📝 Abstract
Enhancing the capability of large language models (LLMs) in reasoning has gained significant attention in recent years. Previous studies have demonstrated the effectiveness of various prompting strategies in aiding LLMs in reasoning (called"reasoning actions"), such as step-by-step thinking, reflecting before answering, solving with programs, and their combinations. However, these approaches often applied static, predefined reasoning actions uniformly to all questions, without considering the specific characteristics of each question or the capability of the task-solving LLM. In this paper, we propose DOTS, an approach enabling LLMs to reason dynamically via optimal reasoning trajectory search, tailored to the specific characteristics of each question and the inherent capability of the task-solving LLM. Our approach involves three key steps: i) defining atomic reasoning action modules that can be composed into various reasoning action trajectories; ii) searching for the optimal action trajectory for each training question through iterative exploration and evaluation for the specific task-solving LLM; and iii) using the collected optimal trajectories to train an LLM to plan for the reasoning trajectories of unseen questions. In particular, we propose two learning paradigms, i.e., fine-tuning an external LLM as a planner to guide the task-solving LLM, or directly fine-tuning the task-solving LLM with an internalized capability for reasoning actions planning. Our experiments across eight reasoning tasks show that our method consistently outperforms static reasoning techniques and the vanilla instruction tuning approach. Further analysis reveals that our method enables LLMs to adjust their computation based on problem complexity, allocating deeper thinking and reasoning to harder problems.
Problem

Research questions and friction points this paper is trying to address.

Dynamic reasoning in LLMs via optimal trajectory search
Tailoring reasoning actions to question and LLM capabilities
Training LLMs to adjust computation based on problem complexity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic reasoning via optimal trajectory search
Composable atomic reasoning action modules
Training LLMs to plan reasoning trajectories
🔎 Similar Papers
No similar papers found.