🤖 AI Summary
This work addresses the challenge of effectively integrating the local inference capability of small language models with the superior performance of large language models while preserving efficiency and privacy. The authors propose a dynamic collaboration framework featuring a learnable help-seeking mechanism that enables the small model to actively decide when and how to request assistance from the large model, which in turn provides adaptive feedback. The approach reveals scaling laws governing the interplay between model capabilities and collaboration strategies and demonstrates robust transferability across unseen large models. Experimental results show that, compared to static collaboration or standalone inference, the proposed framework significantly reduces the number of interactions while improving reasoning accuracy, consistently achieving high efficiency and strong privacy preservation across diverse model pairings.
📝 Abstract
Large language models (LLMs) offer strong capabilities but raise cost and privacy concerns, whereas small language models (SLMs) facilitate efficient and private local inference yet suffer from limited capacity. To synergize the complementary strengths, we introduce a dynamic collaboration framework, where an SLM learns to proactively decide how to request an LLM during multi-step reasoning, while the LLM provides adaptive feedback instead of acting as a passive tool. We further systematically investigate how collaboration strategies are shaped by SLM and LLM capabilities as well as efficiency and privacy constraints. Evaluation results reveal a distinct scaling effect: stronger SLMs become more self-reliant, while stronger LLMs enable fewer and more informative interactions. In addition, the learned dynamic collaboration strategies significantly outperform static pipelines and standalone inference, and transfer robustly to unseen LLMs.