UP-Fuse: Uncertainty-guided LiDAR-Camera Fusion for 3D Panoptic Segmentation

📅 2026-02-22

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

This work addresses the performance degradation of LiDAR-camera fusion for 3D panoptic segmentation under adverse conditions such as camera degradation and calibration drift. To mitigate this issue, the authors propose an uncertainty-guided multimodal fusion framework that introduces a representation-discrepancy-based uncertainty estimation mechanism in the 2D range view. A dynamic cross-modal fusion module is designed to adaptively integrate features, complemented by a novel hybrid 2D–3D Transformer decoder that effectively alleviates spatial ambiguity caused by 2D projection. Extensive experiments demonstrate that the proposed method achieves state-of-the-art performance on the Panoptic nuScenes, SemanticKITTI, and a newly introduced Panoptic Waymo benchmark, exhibiting significantly enhanced robustness particularly under severe visual disturbances or sensor misalignment.

Technology Category

Application Category

📝 Abstract

LiDAR-camera fusion enhances 3D panoptic segmentation by leveraging camera images to complement sparse LiDAR scans, but it also introduces a critical failure mode. Under adverse conditions, degradation or failure of the camera sensor can significantly compromise the reliability of the perception system. To address this problem, we introduce UP-Fuse, a novel uncertainty-aware fusion framework in the 2D range-view that remains robust under camera sensor degradation, calibration drift, and sensor failure. Raw LiDAR data is first projected into the range-view and encoded by a LiDAR encoder, while camera features are simultaneously extracted and projected into the same shared space. At its core, UP-Fuse employs an uncertainty-guided fusion module that dynamically modulates cross-modal interaction using predicted uncertainty maps. These maps are learned by quantifying representational divergence under diverse visual degradations, ensuring that only reliable visual cues influence the fused representation. The fused range-view features are decoded by a novel hybrid 2D-3D transformer that mitigates spatial ambiguities inherent to the 2D projection and directly predicts 3D panoptic segmentation masks. Extensive experiments on Panoptic nuScenes, SemanticKITTI, and our introduced Panoptic Waymo benchmark demonstrate the efficacy and robustness of UP-Fuse, which maintains strong performance even under severe visual corruption or misalignment, making it well suited for robotic perception in safety-critical settings.

Problem

Research questions and friction points this paper is trying to address.

LiDAR-camera fusion

3D panoptic segmentation

sensor degradation

uncertainty

robust perception

Innovation

Methods, ideas, or system contributions that make the work stand out.

uncertainty-guided fusion

LiDAR-camera fusion

3D panoptic segmentation