ModalPatch: A Plug-and-Play Module for Robust Multi-Modal 3D Object Detection under Modality Drop

📅 2026-03-02

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

This work addresses the critical vulnerability of multimodal 3D object detection in autonomous driving systems, where transient sensor failures—particularly simultaneous loss of multiple modalities—can cause severe performance degradation or even complete system failure. To mitigate this issue, the authors propose ModalPatch, a plug-and-play module that compensates for missing modalities without requiring modifications to the backbone architecture or retraining. The core innovation lies in leveraging temporal historical features to predict absent modalities and integrating an uncertainty-guided cross-modal fusion mechanism that dynamically suppresses unreliable signals while enhancing informative features. Extensive experiments demonstrate that ModalPatch significantly improves the robustness and accuracy of state-of-the-art 3D detectors across diverse modality-missing scenarios, confirming its generality and effectiveness.

Technology Category

Application Category

📝 Abstract

Multi-modal 3D object detection is pivotal for autonomous driving, integrating complementary sensors like LiDAR and cameras. However, its real-world reliability is challenged by transient data interruptions and missing, where modalities can momentarily drop due to hardware glitches, adverse weather, or occlusions. This poses a critical risk, especially during a simultaneous modality drop, where the vehicle is momentarily blind. To address this problem, we introduce ModalPatch, the first plug-and-play module designed to enable robust detection under arbitrary modality-drop scenarios. Without requiring architectural changes or retraining, ModalPatch can be seamlessly integrated into diverse detection frameworks. Technically, ModalPatch leverages the temporal nature of sensor data for perceptual continuity, using a history-based module to predict and compensate for transiently unavailable features. To improve the fidelity of the predicted features, we further introduce an uncertainty-guided cross-modality fusion strategy that dynamically estimates the reliability of compensated features, suppressing biased signals while reinforcing informative ones. Extensive experiments show that ModalPatch consistently enhances both robustness and accuracy of state-of-the-art 3D object detectors under diverse modality-drop conditions.

Problem

Research questions and friction points this paper is trying to address.

multi-modal

3D object detection

modality drop

autonomous driving

sensor fusion

Innovation

Methods, ideas, or system contributions that make the work stand out.

ModalPatch

modality drop

temporal feature compensation