🤖 AI Summary
To address the performance limitations and computational redundancy arising from fixed reasoning strategies in large language models (LLMs) for multi-hop question answering, this paper proposes DyPlan-verify—a dynamic, content-aware framework that adaptively selects the optimal strategy (reasoning, planning, or retrieval-augmented generation) per query. Its core innovation is the first integration of meta-strategy decision-making, multi-strategy routing, and a self-verification feedback loop within a dynamic planning paradigm; internal verification enables real-time correction and ensures tight alignment between selected strategies and question semantics. Evaluated on three benchmark datasets—HotpotQA, 2WikiMultiHopQA, and MuSiQue—DyPlan-verify achieves average accuracy improvements of 7–13%, while reducing output token count and retrieval overhead by 11–32%. The method thus significantly enhances accuracy, inference efficiency, and cost-effectiveness simultaneously.
📝 Abstract
Research has shown the effectiveness of reasoning (e.g., Chain-of-Thought), planning (e.g., SelfAsk), and retrieval augmented generation strategies to improve the performance of Large Language Models (LLMs) on various tasks, such as question answering. However, using a single fixed strategy to answer different kinds of questions is suboptimal in performance and inefficient in terms of generated output tokens and performed retrievals. In our work, we propose a novel technique DyPlan, to induce a dynamic strategy selection process in LLMs, to improve performance and reduce costs in question-answering. DyPlan incorporates an initial decision step to select the most suitable strategy conditioned on the input question and guides the LLM's response generation accordingly. We extend DyPlan to DyPlan-verify, adding an internal verification and correction process to further enrich the generated answer. Experiments on three prominent multi-hop question answering (MHQA) datasets reveal how DyPlan can improve model performance by 7-13% while reducing the cost by 11-32% relative to the best baseline model.