EVOCHAMBER: Test-Time Co-evolution of Multi-Agent System at Individual, Team, and Population Scales

📅 2026-05-11

📈 Citations: 0

✨ Influential: 0

career value

229K/year

🤖 AI Summary

Existing test-time multi-agent evolution methods struggle to balance cross-agent learning with collaborative specialization. This work proposes a training-free, multi-scale co-evolution framework that dynamically constructs specialized structures through failure-driven collaborative reflection and asymmetric knowledge transfer across individual, team, and population levels. It is the first approach to enable test-time multi-scale co-evolution, allowing specialized agents to emerge spontaneously while preserving collaborative diversity. Built upon the CODREAM protocol, online team assembly, and population lifecycle operations—including forking, merging, pruning, and seeding—the framework implements heterogeneous task pipelines on Qwen3-8B, achieving accuracies of 63.9%, 75.7%, and 87.1% on competition mathematics, code generation, and multi-domain reasoning tasks, respectively. This represents a 32% relative improvement in mathematical performance and consistently yields 4–5 specialized agents.

📝 Abstract

We argue that multi-agent test-time evolution is not single-agent evolution replicated N times. A single-agent learner can only evolve its own context and memory. A multi-agent system additionally evolves who collaborates, how they collaborate, and how knowledge flows across the population. These components have no single-agent counterpart and can produce phenomena such as emergent specialization. Yet prior test-time methods either confine experiences to individual agents, forfeiting cross-agent learning, or broadcast symmetrically to all agents, erasing the specialization that makes collaboration valuable. We present EVOCHAMBER, a training-free framework that instantiates test-time evolution at three levels over a coevolving agent pool. At its core is CODREAM (Collaborative Dreaming), a post-task protocol triggered on team failure or disagreement, in which agents collaboratively reflect, distill insights, and route them asymmetrically from strong to weak agents on the failed niche, preserving specialization while filling knowledge gaps. Team-level operators assemble niche-conditioned teams and select collaboration structures online. Population-level lifecycle operators fork, merge, prune, and seed agents under performance pressure. On three heterogeneous task streams with Qwen3-8B, EVOCHAMBER reaches 63.9% on competition math, 75.7% on code, and 87.1% on multi-domain reasoning, outperforming the best baseline by 32% relative on math and confirming asymmetric cross-agent transfer as the primary driver in ablation. Starting from several identically initialized agents, four to five stable niche specialists spontaneously emerge, a structural signature of multi-agent evolution that no single-agent learner can express. See our code at: https://github.com/Mercury7353/EvoChamber

Problem

Research questions and friction points this paper is trying to address.

multi-agent evolution

test-time adaptation

emergent specialization

asymmetric knowledge transfer

collaborative learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

test-time evolution

multi-agent coevolution

asymmetric knowledge transfer