🤖 AI Summary
This work addresses the scalability challenge of individual continual learning and collective co-evolution in decentralized, asynchronous, goal-free, and bandwidth-constrained multi-agent reinforcement learning (MARL). We propose the first controller-free modular knowledge sharing framework: it leverages Wasserstein embeddings for policy similarity measurement and dynamic composition; introduces a neural-mask-driven asynchronous policy ensemble mechanism; and spontaneously induces a curriculum from easy to hard tasks. Evaluated on multiple standard RL benchmarks, our method significantly improves sample efficiency—solving certain tasks solely through collaboration. Both individual policy performance and emergent collective capabilities improve concurrently. To our knowledge, this is the first approach achieving robust, scalable cooperative learning dynamics under fully decentralized settings, without global coordination, explicit reward shaping, or centralized training.
📝 Abstract
Agentic AI has gained significant interest as a research paradigm focused on autonomy, self-directed learning, and long-term reliability of decision making. Real-world agentic systems operate in decentralized settings on a large set of tasks or data distributions with constraints such as limited bandwidth, asynchronous execution, and the absence of a centralized model or even common objectives. We posit that exploiting previously learned skills, task similarities, and communication capabilities in a collective of agentic AI are challenging but essential elements to enabling scalability, open-endedness, and beneficial collaborative learning dynamics. In this paper, we introduce Modular Sharing and Composition in Collective Learning (MOSAIC), an agentic algorithm that allows multiple agents to independently solve different tasks while also identifying, sharing, and reusing useful machine-learned knowledge, without coordination, synchronization, or centralized control. MOSAIC combines three mechanisms: (1) modular policy composition via neural network masks, (2) cosine similarity estimation using Wasserstein embeddings for knowledge selection, and (3) asynchronous communication and policy integration. Results on a set of RL benchmarks show that MOSAIC has a greater sample efficiency than isolated learners, i.e., it learns significantly faster, and in some cases, finds solutions to tasks that cannot be solved by isolated learners. The collaborative learning and sharing dynamics are also observed to result in the emergence of ideal curricula of tasks, from easy to hard. These findings support the case for collaborative learning in agentic systems to achieve better and continuously evolving performance both at the individual and collective levels.