Superminds Test: Actively Evaluating Collective Intelligence of Agent Society via Probing Agents

📅 2026-04-24
📈 Citations: 0
Influential: 0
📄 PDF

career value

240K/year
🤖 AI Summary
This study investigates whether collective intelligence—surpassing individual capabilities—can spontaneously emerge in ultra-large-scale artificial agent societies. To this end, we introduce the Superminds Test framework, which for the first time empirically evaluates emergent collective intelligence in systems comprising millions of agents through a hierarchical suite of probe tasks: joint reasoning, information integration, and basic interaction. Deployed on the MoltBook platform and leveraging hierarchical task design, large-scale log analysis, and comparisons against state-of-the-art large language models, our findings reveal that current agent societies fail to outperform individual advanced models in complex reasoning, distributed information synthesis, or coordination tasks, with interactions typically shallow and inefficient. These results indicate that merely scaling up agent numbers is insufficient to elicit collective intelligence, highlighting the critical bottleneck of lacking deep interaction mechanisms.

Technology Category

Application Category

📝 Abstract
Collective intelligence refers to the ability of a group to achieve outcomes beyond what any individual member can accomplish alone. As large language model agents scale to populations of millions, a key question arises: Does collective intelligence emerge spontaneously from scale? We present the first empirical evaluation of this question in a large-scale autonomous agent society. Studying MoltBook, a platform hosting over two million agents, we introduce Superminds Test, a hierarchical framework that probes society-level intelligence using controlled Probing Agents across three tiers: joint reasoning, information synthesis, and basic interaction. Our experiments reveal a stark absence of collective intelligence. The society fails to outperform individual frontier models on complex reasoning tasks, rarely synthesizes distributed information, and often fails even trivial coordination tasks. Platform-wide analysis further shows that interactions remain shallow, with threads rarely extending beyond a single reply and most responses being generic or off-topic. These results suggest that collective intelligence does not emerge from scale alone. Instead, the dominant limitation of current agent societies is extremely sparse and shallow interaction, which prevents agents from exchanging information and building on each other's outputs.
Problem

Research questions and friction points this paper is trying to address.

collective intelligence
large language model agents
emergence
agent society
scale
Innovation

Methods, ideas, or system contributions that make the work stand out.

collective intelligence
large language model agents
Superminds Test
Probing Agents
agent society