VaLID: Verification as Late Integration of Detections for LiDAR-Camera Fusion

📅 2024-09-23
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing LiDAR-camera fusion methods for multimodal autonomous driving often rely on joint training, camera-specific architectures, and exhibit poor generalization. To address this, we propose VaLID, a verification-based late-fusion framework. Its core innovation is a model-agnostic detection verification paradigm: it leverages outputs from arbitrary camera detectors—including generic or open-vocabulary models—to assess the acceptability of LiDAR-detected 3D bounding boxes, eliminating the need for joint training or detector customization. A lightweight, high-recall-biased MLP serves as the verifier, enabling cross-modal confidence calibration at the late stage. On KITTI, VaLID reduces false positive rates by 63.9% on average and achieves 3D AP surpassing unimodal baselines. Remarkably, it attains state-of-the-art performance using off-the-shelf, non-customized camera detectors—significantly enhancing deployment flexibility and cross-domain generalization capability.

Technology Category

Application Category

📝 Abstract
Vehicle object detection benefits from both LiDAR and camera data, with LiDAR offering superior performance in many scenarios. Fusion of these modalities further enhances accuracy, but existing methods often introduce complexity or dataset-specific dependencies. In our study, we propose a model-adaptive late-fusion method, VaLID, which validates whether each predicted bounding box is acceptable or not. Our method verifies the higher-performing, yet overly optimistic LiDAR model detections using camera detections that are obtained from either specially trained, general, or open-vocabulary models. VaLID uses a simple multi-layer perceptron trained with a high recall bias to reduce the false predictions made by the LiDAR detector, while still preserving the true ones. Evaluating with multiple combinations of LiDAR and camera detectors on the KITTI dataset, we reduce false positives by an average of 63.9%, thus outperforming the individual detectors on 3D average precision (3DAP). Our approach is model-adaptive and demonstrates state-of-the-art competitive performance even when using generic camera detectors that were not trained specifically for this dataset.
Problem

Research questions and friction points this paper is trying to address.

Lidar-Camera Fusion
Vehicle Detection
Accuracy Improvement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Late Fusion
LiDAR-Camera Calibration
Accuracy Improvement
🔎 Similar Papers
No similar papers found.
Vanshika Vats
Vanshika Vats
Doctoral Student at University of California, Santa Cruz
Computer VisionMachine LearningDeep LearningVision for Autonomous Vehicles
M
Marzia Binta Nizam
Department of Computer Science and Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
J
James Davis
Department of Computer Science and Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA