MonoCT: Overcoming Monocular 3D Detection Domain Shift with Consistent Teacher Models

📅 2025-03-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses domain shift in monocular 3D object detection across heterogeneous sensors, environments, and camera configurations. To tackle this challenge, we propose MonoCT, an unsupervised domain adaptation framework. Our method introduces three key innovations: (1) a Generalized Depth Enhancement (GDE) module to improve depth estimation robustness under domain shifts; (2) Pseudo-Label Scoring (PLS) and Diversity Maximization (DM) strategies grounded in intrinsic camera model consistency, significantly enhancing pseudo-label reliability and coverage; and (3) a consistency-based teacher–student architecture enabling self-supervised pseudo-label learning. Evaluated on six major benchmarks—including KITTI and Waymo—MonoCT achieves an average ≥21% improvement in AP<sub>mod</sub>, substantially outperforming state-of-the-art methods. The framework demonstrates strong generalization across diverse deployment scenarios, including automotive, traffic surveillance, and UAV-mounted cameras, exhibiting exceptional cross-domain transfer capability.

Technology Category

Application Category

📝 Abstract
We tackle the problem of monocular 3D object detection across different sensors, environments, and camera setups. In this paper, we introduce a novel unsupervised domain adaptation approach, MonoCT, that generates highly accurate pseudo labels for self-supervision. Inspired by our observation that accurate depth estimation is critical to mitigating domain shifts, MonoCT introduces a novel Generalized Depth Enhancement (GDE) module with an ensemble concept to improve depth estimation accuracy. Moreover, we introduce a novel Pseudo Label Scoring (PLS) module by exploring inner-model consistency measurement and a Diversity Maximization (DM) strategy to further generate high-quality pseudo labels for self-training. Extensive experiments on six benchmarks show that MonoCT outperforms existing SOTA domain adaptation methods by large margins (~21% minimum for AP Mod.) and generalizes well to car, traffic camera and drone views.
Problem

Research questions and friction points this paper is trying to address.

Monocular 3D detection across different sensors and environments.
Unsupervised domain adaptation for accurate pseudo label generation.
Improving depth estimation to mitigate domain shift effects.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unsupervised domain adaptation for monocular 3D detection
Generalized Depth Enhancement module improves depth accuracy
Pseudo Label Scoring with inner-model consistency and diversity
🔎 Similar Papers
No similar papers found.
J
Johannes Meier
TU Munich, DeepScenario, Munich Center for Machine Learning
L
Louis Inchingolo
TU Munich, DeepScenario
Oussema Dhaouadi
Oussema Dhaouadi
PhD Student
Computer VisionDeep LearningRobotics
Y
Yan Xia
TU Munich, Munich Center for Machine Learning
Jacques Kaiser
Jacques Kaiser
DeepScenario
Deep networksroboticscomputer vision
Daniel Cremers
Daniel Cremers
Technical University of Munich
Computer VisionMachine LearningOptimizationRobotics