🤖 AI Summary
This work addresses the limitations of existing self-evolving multi-agent systems, which rely on centralized memory and consequently suffer from high communication overhead, privacy risks, and insufficient diversity. To overcome these challenges, the authors propose DecentMem, a decentralized framework that equips each agent with a dual-pool memory architecture—comprising an exploitation pool and an exploration pool—and incorporates an LLM-as-a-judge mechanism to provide periodic feedback for dynamically reweighting memory contents. This design enables efficient continual learning and evolution while preserving agent autonomy. Theoretical analysis establishes that DecentMem guarantees global solution space reachability and achieves a cumulative regret bound of O(log T). Empirical results across five benchmark tasks demonstrate that DecentMem improves average accuracy by 23.8% over the strongest centralized baseline and by 52.5% compared to memory-less baselines, while reducing token consumption by up to 49%.
📝 Abstract
Self-evolving multi-agent systems (MAS) have emerged as a promising route to LLM agents that continually improve from experience, with persistent memory at their foundation. However, existing designs almost exclusively adopt a centralized repository shared across agents, incurring communication and coordination overhead, raising privacy concerns, and collapsing agent diversity. We propose DecentMem, a decentralized memory framework in which each agent maintains its own dual-pool memory -- an exploitation pool of consolidated past trajectories and an exploration pool of LLM-generated candidates for unseen contexts. The two pools are reweighted online based on stage-wise feedback from an LLM-as-a-judge. Theoretically, we prove that this design guarantees global reachability of the solution space and achieves $O(\log T)$ cumulative regret, matching the stochastic bandit lower bound up to constants. In practice, across three MAS frameworks (AutoGen, DyLAN, AgentNet), three Qwen3 backbones (4B/8B/14B), two Gemma4 backbones (E2B/E4B) and five benchmarks spanning math, code, QA, and embodied tasks, DecentMem improves average accuracy by up to 23.8% over the strongest centralized memory baseline and by up to 52.5% over the no-memory baseline, while reducing token usage by up to 49%.