π€ AI Summary
This paper addresses composite question answering (QA) tasks comprising both known and unknown subproblems, where existing methods lack fine-grained adaptivity in selecting between internal knowledge utilization and external retrieval per subproblem.
Method: We propose Self-DC, an adaptive divide-and-conquer framework for large language models (LLMs), featuring meta-prompt-driven subproblem decomposition, dynamic execution path planning, and retrieval-generation co-scheduling. We further introduce CuQAβthe first benchmark dataset for composite QA with unknown subproblems.
Contribution/Results: Self-DC achieves state-of-the-art or competitive performance on two major benchmarks while significantly reducing external API calls by 38.7% on average. It establishes a novel paradigm of efficient, fine-grained reasoning-retrieval synergy. Its core innovations include a subproblem-level adaptive decision mechanism and a scalable hybrid solving architecture, enabling LLMs to dynamically optimize the balance between generation and retrieval at granular semantic units.
π Abstract
Previous research has typically concentrated on leveraging the internal knowledge of Large Language Models (LLMs) to answer known questions (i.e., extit{internal reasoning such as generate-then-read}). In contrast, for questions that fall outside their known scope, these models rely on external knowledge retrieval to provide accurate responses (i.e., extit{external acting such as retrieve-then-read}). However, few previous works consider the extit{compositional questions}, which consist of several known and unknown sub-questions, necessitating the dynamic combination of previous two methods (i.e., extit{internal reasoning and external acting}) to achieve a better trade-off between effectiveness and efficiency. To this end, we introduce a extbf{Self} extbf{D}ivide-and- extbf{C}onquer ( extit{ exttt{Self-DC}}) framework, accompanying with the first extbf{C}ompositional extbf{u}nknown extbf{Q}uestion- extbf{A}nswering dataset (CuQA). This framework enables LLMs to adaptively choose between using internal knowledge and retrieving external knowledge as needed, resulting in a better trade-off between effectiveness and efficiency. Experimental results on two datasets demonstrate that extit{ exttt{Self-DC}} can achieve comparable or even better performance with much fewer external calls compared with several strong baselines.