GauDP: Reinventing Multi-Agent Collaboration through Gaussian-Image Synergy in Diffusion Policies

📅 2025-11-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Embodied multi-agent systems face challenges in harmonizing local perception with global understanding and suffer from limited scalability. Method: This paper proposes a decentralized, RGB-only approach to constructing globally consistent 3D Gaussian splatting fields. Leveraging a Gaussian-image co-representation mechanism, it introduces attribute redistribution of 3D Gaussians into multi-agent collaboration for the first time, enabling distributed scene reconstruction and task-relevant feature sharing solely from monocular RGB inputs. The method integrates multi-view geometric fusion, distributed feature reprojection, and diffusion-policy-driven imitation learning—requiring no additional sensors. Results: Evaluated on the RoboFactory benchmark, our approach achieves performance comparable to point-cloud-based baselines and substantially outperforms existing pure-image methods. Crucially, it demonstrates superior scalability with increasing agent count, validating its efficacy for large-scale embodied multi-agent coordination.

Technology Category

Application Category

📝 Abstract
Recently, effective coordination in embodied multi-agent systems has remained a fundamental challenge, particularly in scenarios where agents must balance individual perspectives with global environmental awareness. Existing approaches often struggle to balance fine-grained local control with comprehensive scene understanding, resulting in limited scalability and compromised collaboration quality. In this paper, we present GauDP, a novel Gaussian-image synergistic representation that facilitates scalable, perception-aware imitation learning in multi-agent collaborative systems. Specifically, GauDP constructs a globally consistent 3D Gaussian field from decentralized RGB observations, then dynamically redistributes 3D Gaussian attributes to each agent's local perspective. This enables all agents to adaptively query task-critical features from the shared scene representation while maintaining their individual viewpoints. This design facilitates both fine-grained control and globally coherent behavior without requiring additional sensing modalities (e.g., 3D point cloud). We evaluate GauDP on the RoboFactory benchmark, which includes diverse multi-arm manipulation tasks. Our method achieves superior performance over existing image-based methods and approaches the effectiveness of point-cloud-driven methods, while maintaining strong scalability as the number of agents increases.
Problem

Research questions and friction points this paper is trying to address.

Balancing individual agent perspectives with global environmental awareness in multi-agent systems
Overcoming limitations in fine-grained local control and comprehensive scene understanding
Enabling scalable perception-aware imitation learning without additional sensing modalities
Innovation

Methods, ideas, or system contributions that make the work stand out.

Gaussian-image synergy enables scalable multi-agent collaboration
Globally consistent 3D Gaussian field from decentralized RGB observations
Dynamic redistribution of 3D attributes maintains individual viewpoints
🔎 Similar Papers
No similar papers found.
Ziye Wang
Ziye Wang
China University of Geosciences
Mathematic Geosciences
L
Li Kang
Shanghai Jiao Tong University
Y
Yiran Qin
The Chinese University of Hong Kong, Shenzhen
J
Jiahua Ma
Sun Yat-sen University
Zhanglin Peng
Zhanglin Peng
The University of Hong Kong
Lei Bai
Lei Bai
Shanghai AI Laboratory
Foundation ModelScience IntelligenceMulti-Agent SystemAutonomous Discovery
R
Ruimao Zhang
Sun Yat-sen University