LeAD-M3D: Leveraging Asymmetric Distillation for Real-time Monocular 3D Detection

📅 2025-12-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Real-time monocular 3D object detection faces fundamental challenges including depth ambiguity, viewpoint variation, and high computational overhead. This paper proposes a pure image-based, end-to-end framework that requires no LiDAR, stereo inputs, or geometric priors. It achieves Pareto-optimal trade-offs between accuracy and speed through three key innovations: (1) asymmetric augmentation-based denoising distillation, integrating mixup-augmented student networks with quality-weighted depth feature losses; (2) MGIoU-guided 3D-consistent matching and confidence-gated inference; and (3) selective 3D regression optimization. The method establishes new state-of-the-art performance on KITTI and Waymo Open Dataset, achieves the highest vehicle AP on Rope3D, and attains up to 3.6× faster inference speed compared to existing approaches—without compromising detection accuracy.

Technology Category

Application Category

📝 Abstract
Real-time monocular 3D object detection remains challenging due to severe depth ambiguity, viewpoint shifts, and the high computational cost of 3D reasoning. Existing approaches either rely on LiDAR or geometric priors to compensate for missing depth, or sacrifice efficiency to achieve competitive accuracy. We introduce LeAD-M3D, a monocular 3D detector that achieves state-of-the-art accuracy and real-time inference without extra modalities. Our method is powered by three key components. Asymmetric Augmentation Denoising Distillation (A2D2) transfers geometric knowledge from a clean-image teacher to a mixup-noised student via a quality- and importance-weighted depth-feature loss, enabling stronger depth reasoning without LiDAR supervision. 3D-aware Consistent Matching (CM3D) improves prediction-to-ground truth assignment by integrating 3D MGIoU into the matching score, yielding more stable and precise supervision. Finally, Confidence-Gated 3D Inference (CGI3D) accelerates detection by restricting expensive 3D regression to top-confidence regions. Together, these components set a new Pareto frontier for monocular 3D detection: LeAD-M3D achieves state-of-the-art accuracy on KITTI and Waymo, and the best reported car AP on Rope3D, while running up to 3.6x faster than prior high-accuracy methods. Our results demonstrate that high fidelity and real-time efficiency in monocular 3D detection are simultaneously attainable - without LiDAR, stereo, or geometric assumptions.
Problem

Research questions and friction points this paper is trying to address.

Addresses depth ambiguity and viewpoint shifts in monocular 3D detection
Reduces computational cost for real-time 3D reasoning without LiDAR
Improves accuracy and efficiency simultaneously without geometric priors
Innovation

Methods, ideas, or system contributions that make the work stand out.

Asymmetric distillation transfers geometric knowledge without LiDAR
3D-aware matching integrates MGIoU for precise supervision
Confidence-gated inference accelerates 3D regression in key regions
🔎 Similar Papers
No similar papers found.