LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes

📅 2025-01-07

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

To address insufficient geometric and semantic information exploitation in LiDAR pretraining caused by reliance on a single sparse voxel representation, this paper proposes a multi-representation collaborative representation learning framework tailored for autonomous driving. We pioneer the integration of the Mixture of Experts (MoE) architecture to jointly model three complementary LiDAR representations: range images, sparse voxels, and raw point clouds. A two-stage distillation mechanism is introduced: (i) Contrastive Mixture Learning (CML) enforces cross-representation consistency via contrastive alignment; and (ii) Semantic Mixture Supervision (SMS) fuses logits from all representations and distills them into a unified 3D network. Evaluated on 11 large-scale LiDAR datasets, our method achieves significant improvements in 3D semantic segmentation accuracy. The code and pretrained models are publicly released.

Technology Category

Application Category

📝 Abstract

LiDAR data pretraining offers a promising approach to leveraging large-scale, readily available datasets for enhanced data utilization. However, existing methods predominantly focus on sparse voxel representation, overlooking the complementary attributes provided by other LiDAR representations. In this work, we propose LiMoE, a framework that integrates the Mixture of Experts (MoE) paradigm into LiDAR data representation learning to synergistically combine multiple representations, such as range images, sparse voxels, and raw points. Our approach consists of three stages: i) Image-to-LiDAR Pretraining, which transfers prior knowledge from images to point clouds across different representations; ii) Contrastive Mixture Learning (CML), which uses MoE to adaptively activate relevant attributes from each representation and distills these mixed features into a unified 3D network; iii) Semantic Mixture Supervision (SMS), which combines semantic logits from multiple representations to boost downstream segmentation performance. Extensive experiments across 11 large-scale LiDAR datasets demonstrate our effectiveness and superiority. The code and model checkpoints have been made publicly accessible.

Problem

Research questions and friction points this paper is trying to address.

LiDAR pre-training

sparse voxel representation

lidar information understanding

Innovation

Methods, ideas, or system contributions that make the work stand out.

LiMoE Framework

Multi-modal Fusion

Mixture of Experts (MoE)

🔎 Similar Papers

MUVO: A Multimodal Generative World Model for Autonomous Driving with Geometric Representations