🤖 AI Summary
To address weak cross-domain generalization in 3D point cloud semantic segmentation—caused by the absence of domain labels for heterogeneous multi-source data (e.g., LiDAR/depth cameras, indoor/outdoor scenes)—this work introduces sparse-gated Mixture-of-Experts (MoE) to 3D point cloud understanding for the first time. Built upon backbone networks such as PointNet++ and PAConv, we propose a lightweight top-k routing mechanism that enables end-to-end, domain-label-free expert selection and joint cross-domain training. Our method achieves significant improvements over state-of-the-art multi-domain approaches on multiple heterogeneous domain benchmarks. Notably, it attains a +12.7% zero-shot mIoU gain on unseen domains, demonstrating strong generalization capability, scalability, and practical deployment potential.
📝 Abstract
While scaling laws have transformed natural language processing and computer vision, 3D point cloud understanding has yet to reach that stage. This can be attributed to both the comparatively smaller scale of 3D datasets, as well as the disparate sources of the data itself. Point clouds are captured by diverse sensors (e.g., depth cameras, LiDAR) across varied domains (e.g., indoor, outdoor), each introducing unique scanning patterns, sampling densities, and semantic biases. Such domain heterogeneity poses a major barrier towards training unified models at scale, especially under the realistic constraint that domain labels are typically inaccessible at inference time. In this work, we propose Point-MoE, a Mixture-of-Experts architecture designed to enable large-scale, cross-domain generalization in 3D perception. We show that standard point cloud backbones degrade significantly in performance when trained on mixed-domain data, whereas Point-MoE with a simple top-k routing strategy can automatically specialize experts, even without access to domain labels. Our experiments demonstrate that Point-MoE not only outperforms strong multi-domain baselines but also generalizes better to unseen domains. This work highlights a scalable path forward for 3D understanding: letting the model discover structure in diverse 3D data, rather than imposing it via manual curation or domain supervision.