Open-Source LLMs Collaboration Beats Closed-Source LLMs: A Scalable Multi-Agent System

📅 2025-07-14

📈 Citations: 0

✨ Influential: 0

career value

236K/year

🤖 AI Summary

Can collaborative orchestration of multiple open-source large language models (LLMs) systematically outperform state-of-the-art closed-source models? Method: This paper introduces SMACS, a scalable multi-agent collaboration system featuring a novel “retrieval-based prior selection + exploration-exploitation-driven posterior enhancement” mechanism. SMACS dynamically selects optimal models and generates high-quality, diverse outputs by integrating retrieval-augmented model selection, agent scoring, prior pruning, and hybrid posterior scoring across 15 open-source LLMs. Contribution/Results: On eight mainstream benchmarks, SMACS significantly surpasses leading 2025 closed-source models—achieving +12.73% over Claude-3.7-Sonnet and +5.36% over GPT-4.1—while establishing a new average performance ceiling across both open- and closed-source models. This work provides the first systematic empirical validation that coordinated open-source LLMs can collectively transcend fundamental limitations of individual models.

Technology Category

Application Category

📝 Abstract

This paper aims to demonstrate the potential and strengths of open-source collectives. It leads to a promising question: Can we harness multiple open-source LLMs to match or even beat the closed-source LLMs? To answer this, we propose SMACS, a scalable multi-agent collaboration system (MACS) framework with high performance. Specifically, for continuous integration of new LLMs and generalization to diverse questions, we first propose a Retrieval-based Prior Selection (RPS), which assigns a proxy performance score to each LLM to select the Top-k LLMs at the instance level for any given question. Then, we propose an Exploration-Exploitation-Driven Posterior Enhancement (EPE), encouraging the generation of diverse responses through prior dropping and selecting the high-quality response via a hybrid posterior score. Experiments on eight mainstream benchmarks validate the effectiveness of our SMACS: by integrating fifteen open-source LLMs, SMACS outperforms leading closed-source LLMs in 2025, e.g., Claude-3.7-Sonnet (+12.73%), GPT-4.1(+5.36%) and GPT-o3-mini(+5.28%) across multiple tasks. Remarkably, it even exceeds the average of best results of different datasets from both open-source LLMs (+2.86%) and closed-source LLMs (+2.04%), pushing the upper bound of intelligence. Code will be released at https://github.com/magent4aci/SMACS.

Problem

Research questions and friction points this paper is trying to address.

Can multiple open-source LLMs outperform closed-source LLMs?

How to integrate new LLMs for diverse question generalization?

How to enhance response quality via exploration-exploitation strategies?

Innovation

Methods, ideas, or system contributions that make the work stand out.

Scalable multi-agent collaboration system (SMACS)

Retrieval-based Prior Selection (RPS) for Top-k LLMs

Exploration-Exploitation-Driven Posterior Enhancement (EPE)

🔎 Similar Papers

MegaAgent: A Large-Scale Autonomous LLM-based Multi-Agent System Without Predefined SOPs