π€ AI Summary
To address the challenge of real-time collaborative 3D semantic mapping with heterogeneous multi-robot data under unknown initial poses and dynamic device motion, this paper proposes the first prior-free online collaborative mapping framework. Methodologically, it introduces a prior-free frame alignment module for spatiotemporal registration across devices; an online semantic Gaussian splatting training scheme integrating CLIP-based open-vocabulary embeddings with incremental optimization; and support for ROS-based distributed communication and fusion of heterogeneous SLAM outputs. Key contributions include: (i) the first end-to-end, initialization-free multi-robot semantic Gaussian mapping architecture; and (ii) a cross-modal adaptive preprocessing and semantic distillation mechanism. Experiments demonstrate that the method achieves twofold higher reconstruction fidelity over baseline approaches and effectively enables fine-grained semantic navigation tasksβe.g., βgo beside the sofa.β
π Abstract
3D Gaussian Splatting offers expressive scene reconstruction, modeling a broad range of visual, geometric, and semantic information. However, efficient real-time map reconstruction with data streamed from multiple robots and devices remains a challenge. To that end, we propose HAMMER, a server-based collaborative Gaussian Splatting method that leverages widely available ROS communication infrastructure to generate 3D, metric-semantic maps from asynchronous robot data-streams with no prior knowledge of initial robot positions and varying on-device pose estimators. HAMMER consists of (i) a frame alignment module that transforms local SLAM poses and image data into a global frame and requires no prior relative pose knowledge, and (ii) an online module for training semantic 3DGS maps from streaming data. HAMMER handles mixed perception modes, adjusts automatically for variations in image pre-processing among different devices, and distills CLIP semantic codes into the 3D scene for open-vocabulary language queries. In our real-world experiments, HAMMER creates higher-fidelity maps (2x) compared to competing baselines and is useful for downstream tasks, such as semantic goal-conditioned navigation (e.g., ``go to the couch"). Accompanying content available at hammer-project.github.io.