🤖 AI Summary
Embodied multi-agent systems face challenges in harmonizing local perception with global understanding and suffer from limited scalability. Method: This paper proposes a decentralized, RGB-only approach to constructing globally consistent 3D Gaussian splatting fields. Leveraging a Gaussian-image co-representation mechanism, it introduces attribute redistribution of 3D Gaussians into multi-agent collaboration for the first time, enabling distributed scene reconstruction and task-relevant feature sharing solely from monocular RGB inputs. The method integrates multi-view geometric fusion, distributed feature reprojection, and diffusion-policy-driven imitation learning—requiring no additional sensors. Results: Evaluated on the RoboFactory benchmark, our approach achieves performance comparable to point-cloud-based baselines and substantially outperforms existing pure-image methods. Crucially, it demonstrates superior scalability with increasing agent count, validating its efficacy for large-scale embodied multi-agent coordination.
📝 Abstract
Recently, effective coordination in embodied multi-agent systems has remained a fundamental challenge, particularly in scenarios where agents must balance individual perspectives with global environmental awareness. Existing approaches often struggle to balance fine-grained local control with comprehensive scene understanding, resulting in limited scalability and compromised collaboration quality. In this paper, we present GauDP, a novel Gaussian-image synergistic representation that facilitates scalable, perception-aware imitation learning in multi-agent collaborative systems. Specifically, GauDP constructs a globally consistent 3D Gaussian field from decentralized RGB observations, then dynamically redistributes 3D Gaussian attributes to each agent's local perspective. This enables all agents to adaptively query task-critical features from the shared scene representation while maintaining their individual viewpoints. This design facilitates both fine-grained control and globally coherent behavior without requiring additional sensing modalities (e.g., 3D point cloud). We evaluate GauDP on the RoboFactory benchmark, which includes diverse multi-arm manipulation tasks. Our method achieves superior performance over existing image-based methods and approaches the effectiveness of point-cloud-driven methods, while maintaining strong scalability as the number of agents increases.