🤖 AI Summary
This paper addresses two key challenges in code generation—a prototypical System 2 reasoning task—for large language models (LLMs): (1) difficulty in modeling implicit, complex reasoning chains, and (2) poor generalization and robustness stemming from heterogeneous data distributions. To this end, we propose the BDC framework, featuring three novel components: (1) MC-Tree-Of-Agents, which integrates Monte Carlo tree search, reflective pruning, and multi-model mutual enhancement to enable verifiable, collaborative reasoning; (2) DisenLora, a disentanglement-based method that decomposes heterogeneous training data and constructs a composable LoRA expert library; and (3) an input-aware hypernetwork that dynamically weights and ensembles expert solvers. Evaluated on HumanEval, MBPP, and cross-domain transfer benchmarks, BDC achieves state-of-the-art performance—significantly improving accuracy and robustness under few-shot, multi-distribution, and adversarial perturbation settings.
📝 Abstract
Large language models (LLMs) have demonstrated remarkable capabilities in various domains, particularly in system 1 tasks, yet the intricacies of their problem-solving mechanisms in system 2 tasks are not sufficiently explored. Recent research on System2-to-System1 methods surge, exploring the System 2 reasoning knowledge via inference-time computation and compressing the explored knowledge into System 1 process. In this paper, we focus on code generation, which is a representative System 2 task, and identify two primary challenges: (1) the complex hidden reasoning processes and (2) the heterogeneous data distributions that complicate the exploration and training of robust LLM solvers. To tackle these issues, we propose a novel BDC framework that explores insightful System 2 knowledge of LLMs using a MC-Tree-Of-Agents algorithm with mutual extbf{B}oosting, extbf{D}isentangles the heterogeneous training data for composable LoRA-experts, and obtain extbf{C}ustomized problem solver for each data instance with an input-aware hypernetwork to weight over the LoRA-experts, offering effectiveness, flexibility, and robustness. This framework leverages multiple LLMs through mutual verification and boosting, integrated into a Monte-Carlo Tree Search process enhanced by reflection-based pruning and refinement. Additionally, we introduce the DisenLora algorithm, which clusters heterogeneous data to fine-tune LLMs into composable Lora experts, enabling the adaptive generation of customized problem solvers through an input-aware hypernetwork. This work lays the groundwork for advancing LLM capabilities in complex reasoning tasks, offering a novel System2-to-System1 solution.