🤖 AI Summary
Existing large language model (LLM) collaboration frameworks predominantly rely on server-side integration, making them ill-suited for real-world internet architectures—characterized by a small number of servers serving massive numbers of resource-constrained clients.
Method: This paper proposes CoLM, the first client-server collaborative inference framework, wherein lightweight client models autonomously refine their outputs guided by high-quality responses shared by servers. CoLM introduces a low-overhead communication protocol, a dynamic output aggregation module, and a cross-model knowledge guidance strategy, enabling asynchronous, scalable, and vision-language model–extensible collaboration.
Contribution/Results: On multiple benchmarks, CoLM significantly improves client models’ accuracy on historically failed questions, demonstrating that collaborative guidance effectively enhances individual model capabilities. It provides a novel, efficient deployment paradigm for LLMs in resource-constrained edge scenarios.
📝 Abstract
Large models have achieved remarkable performance across a range of reasoning and understanding tasks. Prior work often utilizes model ensembles or multi-agent systems to collaboratively generate responses, effectively operating in a server-to-server paradigm. However, such approaches do not align well with practical deployment settings, where a limited number of server-side models are shared by many clients under modern internet architectures. In this paper, we introduce extbf{CoLM} ( extbf{Co}llaboration in extbf{L}arge- extbf{M}odels), a novel framework for collaborative reasoning that redefines cooperation among large models from a client-server perspective. Unlike traditional ensemble methods that rely on simultaneous inference from multiple models to produce a single output, CoLM allows the outputs of multiple models to be aggregated or shared, enabling each client model to independently refine and update its own generation based on these high-quality outputs. This design enables collaborative benefits by fully leveraging both client-side and shared server-side models. We further extend CoLM to vision-language models (VLMs), demonstrating its applicability beyond language tasks. Experimental results across multiple benchmarks show that CoLM consistently improves model performance on previously failed queries, highlighting the effectiveness of collaborative guidance in enhancing single-model capabilities.