Dual Latent Memory for Visual Multi-agent System

📅 2026-01-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the scalability bottleneck in vision-based multi-agent systems, where text-based communication often creates an information bottleneck that degrades performance as the number of agents increases—a phenomenon known as the “scaling wall.” To overcome this limitation, the authors propose the L²-VMAS framework, which introduces a novel dual latent memory architecture to decouple perception from reasoning and incorporates an entropy-driven active triggering mechanism for on-demand, efficient memory sharing. The approach is model-agnostic and demonstrates consistent improvements across diverse backbone networks and multi-agent configurations, achieving average accuracy gains of 2.7–5.4% while reducing communication token usage by 21.3–44.8%, thereby significantly enhancing system scalability.

Technology Category

Application Category

📝 Abstract
While Visual Multi-Agent Systems (VMAS) promise to enhance comprehensive abilities through inter-agent collaboration, empirical evidence reveals a counter-intuitive"scaling wall": increasing agent turns often degrades performance while exponentially inflating token costs. We attribute this failure to the information bottleneck inherent in text-centric communication, where converting perceptual and thinking trajectories into discrete natural language inevitably induces semantic loss. To this end, we propose L$^{2}$-VMAS, a novel model-agnostic framework that enables inter-agent collaboration with dual latent memories. Furthermore, we decouple the perception and thinking while dynamically synthesizing dual latent memories. Additionally, we introduce an entropy-driven proactive triggering that replaces passive information transmission with efficient, on-demand memory access. Extensive experiments among backbones, sizes, and multi-agent structures demonstrate that our method effectively breaks the"scaling wall"with superb scalability, improving average accuracy by 2.7-5.4% while reducing token usage by 21.3-44.8%. Codes: https://github.com/YU-deep/L2-VMAS.
Problem

Research questions and friction points this paper is trying to address.

Visual Multi-Agent Systems
scaling wall
information bottleneck
semantic loss
token cost
Innovation

Methods, ideas, or system contributions that make the work stand out.

dual latent memory
visual multi-agent system
information bottleneck
entropy-driven triggering
model-agnostic framework
🔎 Similar Papers
No similar papers found.