🤖 AI Summary
Large language models (LLMs) incur prohibitive computational overhead when performing deep reasoning on complex tasks.
Method: This paper proposes a collaborative agent system integrating lightweight small models and resource-intensive LLMs: small models generate initial answers, while LLMs are adaptively invoked only upon answer verification failure or low confidence—enabling conditional, on-demand deep reasoning. The approach innovatively combines an answer verification mechanism with a confidence-aware, trigger-based reasoning strategy to dynamically balance computational efficiency and reasoning accuracy.
Contribution/Results: Experiments demonstrate over 50% reduction in LLM invocation frequency on simple questions, with negligible accuracy degradation; robust performance is maintained on complex tasks. This work establishes a novel, empirically validated hybrid reasoning paradigm that enhances both efficiency and controllability in LLM-based inference systems.
📝 Abstract
Recent advances in Large Language Models (LLMs) demonstrate that chain-of-thought prompting and deep reasoning substantially enhance performance on complex tasks, and multi-agent systems can further improve accuracy by enabling model debates. However, applying deep reasoning to all problems is computationally expensive. To mitigate these costs, we propose a complementary agent system integrating small and large LLMs. The small LLM first generates an initial answer, which is then verified by the large LLM. If correct, the answer is adopted directly; otherwise, the large LLM performs in-depth reasoning. Experimental results show that, for simple problems, our approach reduces the computational cost of the large LLM by more than 50% with negligible accuracy loss, while consistently maintaining robust performance on complex tasks.