SAMFusion: Sensor-Adaptive Multimodal Fusion for 3D Object Detection in Adverse Weather

📅 2025-08-22

📈 Citations: 0

✨ Influential: 0

career value

228K/year

🤖 AI Summary

To address the significant degradation in multimodal fusion performance—and consequently, the poor robustness of 3D object detection—under adverse weather conditions (e.g., dense fog, heavy snow, atmospheric pollution), this paper proposes a cross-modal fusion framework tailored for extreme environments. The method integrates RGB, LiDAR, near-infrared (NIR) gated imaging, and radar data, incorporating four key innovations: depth-guided feature alignment, attention-driven adaptive weighted fusion, bird’s-eye-view (BEV) feature refinement, and a Transformer-based decoder. Crucially, it introduces dynamic modality weighting conditioned on both distance and visibility estimates, substantially enhancing feature complementarity and discriminability under low-visibility conditions. Evaluated on long-range pedestrian detection and dense-fog benchmarks, the framework achieves a 17.2% absolute improvement in average precision (AP) over state-of-the-art methods, effectively bridging the performance gap between ideal laboratory settings and real-world edge cases.

Technology Category

Application Category

📝 Abstract

Multimodal sensor fusion is an essential capability for autonomous robots, enabling object detection and decision-making in the presence of failing or uncertain inputs. While recent fusion methods excel in normal environmental conditions, these approaches fail in adverse weather, e.g., heavy fog, snow, or obstructions due to soiling. We introduce a novel multi-sensor fusion approach tailored to adverse weather conditions. In addition to fusing RGB and LiDAR sensors, which are employed in recent autonomous driving literature, our sensor fusion stack is also capable of learning from NIR gated camera and radar modalities to tackle low light and inclement weather. We fuse multimodal sensor data through attentive, depth-based blending schemes, with learned refinement on the Bird's Eye View (BEV) plane to combine image and range features effectively. Our detections are predicted by a transformer decoder that weighs modalities based on distance and visibility. We demonstrate that our method improves the reliability of multimodal sensor fusion in autonomous vehicles under challenging weather conditions, bridging the gap between ideal conditions and real-world edge cases. Our approach improves average precision by 17.2 AP compared to the next best method for vulnerable pedestrians in long distances and challenging foggy scenes. Our project page is available at https://light.princeton.edu/samfusion/

Problem

Research questions and friction points this paper is trying to address.

Improving 3D object detection reliability in adverse weather

Fusing multiple sensors like LiDAR, RGB, NIR, and radar

Addressing performance gaps in fog, snow, and low light

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sensor-adaptive multimodal fusion for adverse weather

Depth-based blending with BEV refinement techniques

Transformer decoder weighing modalities by distance visibility

🔎 Similar Papers

L4DR: LiDAR-4DRadar Fusion for Weather-Robust 3D Object Detection