Open-Source LLMs Collaboration Beats Closed-Source LLMs: A Scalable Multi-Agent System

📅 2025-07-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Can collaborative orchestration of multiple open-source large language models (LLMs) systematically outperform state-of-the-art closed-source models? Method: This paper introduces SMACS, a scalable multi-agent collaboration system featuring a novel “retrieval-based prior selection + exploration-exploitation-driven posterior enhancement” mechanism. SMACS dynamically selects optimal models and generates high-quality, diverse outputs by integrating retrieval-augmented model selection, agent scoring, prior pruning, and hybrid posterior scoring across 15 open-source LLMs. Contribution/Results: On eight mainstream benchmarks, SMACS significantly surpasses leading 2025 closed-source models—achieving +12.73% over Claude-3.7-Sonnet and +5.36% over GPT-4.1—while establishing a new average performance ceiling across both open- and closed-source models. This work provides the first systematic empirical validation that coordinated open-source LLMs can collectively transcend fundamental limitations of individual models.

Technology Category

Application Category

📝 Abstract
This paper aims to demonstrate the potential and strengths of open-source collectives. It leads to a promising question: Can we harness multiple open-source LLMs to match or even beat the closed-source LLMs? To answer this, we propose SMACS, a scalable multi-agent collaboration system (MACS) framework with high performance. Specifically, for continuous integration of new LLMs and generalization to diverse questions, we first propose a Retrieval-based Prior Selection (RPS), which assigns a proxy performance score to each LLM to select the Top-k LLMs at the instance level for any given question. Then, we propose an Exploration-Exploitation-Driven Posterior Enhancement (EPE), encouraging the generation of diverse responses through prior dropping and selecting the high-quality response via a hybrid posterior score. Experiments on eight mainstream benchmarks validate the effectiveness of our SMACS: by integrating fifteen open-source LLMs, SMACS outperforms leading closed-source LLMs in 2025, e.g., Claude-3.7-Sonnet (+12.73%), GPT-4.1(+5.36%) and GPT-o3-mini(+5.28%) across multiple tasks. Remarkably, it even exceeds the average of best results of different datasets from both open-source LLMs (+2.86%) and closed-source LLMs (+2.04%), pushing the upper bound of intelligence. Code will be released at https://github.com/magent4aci/SMACS.
Problem

Research questions and friction points this paper is trying to address.

Can multiple open-source LLMs outperform closed-source LLMs?
How to integrate new LLMs for diverse question generalization?
How to enhance response quality via exploration-exploitation strategies?
Innovation

Methods, ideas, or system contributions that make the work stand out.

Scalable multi-agent collaboration system (SMACS)
Retrieval-based Prior Selection (RPS) for Top-k LLMs
Exploration-Exploitation-Driven Posterior Enhancement (EPE)
🔎 Similar Papers
No similar papers found.
Shengji Tang
Shengji Tang
CUHK & Fudan University & Shanghai AI Lab
machine learningmodel compressionmodel design
Jianjian Cao
Jianjian Cao
Fudan University
Multimodal LearningModel CompressMLLM
Weihao Lin
Weihao Lin
PHD Student, Fudan University
Deep learningComputer visionVideo understandingModel compression
J
Jiale Hong
Shanghai Jiao Tong University
B
Bo Zhang
Shanghai Artificial Intelligence Laboratory
Shuyue Hu
Shuyue Hu
Shanghai Artificial Intelligence Lab
multiagent systemlarge language modelgame theory
Lei Bai
Lei Bai
Shanghai AI Laboratory
Foundation ModelScience IntelligenceMulti-Agent SystemAutonomous Discovery
T
Tao Chen
Fudan University
W
Wanli Ouyang
Shanghai Artificial Intelligence Laboratory, The Chinese University of Hong Kong
P
Peng Ye
Shanghai Artificial Intelligence Laboratory, The Chinese University of Hong Kong