🤖 AI Summary
Existing work lacks a unified theoretical framework for predicting the performance upper bounds of multi-LLM collaboration (e.g., routing, posterior ensembling).
Method: We propose the first general scaling law for model ensembles, establishing a power-law relationship between collective performance and total parameter count. Our approach introduces an idealized Oracle ensemble, incorporates cross-model cross-entropy minimization, and adopts method-agnostic modeling, validated via rigorous scaling analysis and power-law fitting.
Contributions/Results: (1) We provide the first theoretical proof that multi-model system performance scales as a power law in total parameters, with a provable loss lower bound strictly below that of the optimal single model; (2) We rigorously falsify the isomorphism assumption, demonstrating that heterogeneous ensembling significantly improves both the scaling exponent and absolute performance gain. This law establishes a foundational theoretical basis for principled scale design and fundamental limit assessment in multi-model collaborative systems.
📝 Abstract
Recent advances in large language models (LLMs) have been largely driven by scaling laws for individual models, which predict performance improvements as model parameters and data volume increase. However, the capabilities of any single LLM are inherently bounded. One solution originates from intricate interactions among multiple LLMs, rendering their collective performance surpasses that of any constituent model. Despite the rapid proliferation of multi-model integration techniques such as model routing and post-hoc ensembling, a unifying theoretical framework of performance scaling for multi-model collaboration remains absent. In this work, we propose the Law of Multi-model Collaboration, a scaling law that predicts the performance limits of LLM ensembles based on their aggregated parameter budget. To quantify the intrinsic upper bound of multi-model collaboration, we adopt a method-agnostic formulation and assume an idealized integration oracle where the total cross-entropy loss of each sample is determined by the minimum loss of any model in the model pool. Experimental results reveal that multi-model systems follow a power-law scaling with respect to the total parameter count, exhibiting a more significant improvement trend and a lower theoretical loss floor compared to single model scaling. Moreover, ensembles of heterogeneous model families achieve better performance scaling than those formed within a single model family, indicating that model diversity is a primary driver of collaboration gains. These findings suggest that model collaboration represents a critical axis for extending the intelligence frontier of LLMs.