CHARM3R: Towards Unseen Camera Height Robust Monocular 3D Detector

πŸ“… 2025-08-14
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Monocular 3D object detection suffers significant performance degradation when test-time camera height deviates from that used during training. This work is the first to systematically reveal that camera height variation induces systematic, structured biases in depth estimation. To address this, we propose a dual-path deep fusion architecture that jointly regresses depth and estimates PlΓΌcker line embeddings constrained by ground-plane geometry. We further incorporate image-adaptive geometric transformations and multi-height CARLA-based data augmentation to enhance generalization across varying camera heights. A depth-averaging strategy is introduced to improve robustness of depth predictions. Extensive experiments demonstrate that our method achieves over a 45% improvement in 3D detection AP under unseen camera heights, substantially outperforming prior approaches and establishing new state-of-the-art performance.

Technology Category

Application Category

πŸ“ Abstract
Monocular 3D object detectors, while effective on data from one ego camera height, struggle with unseen or out-of-distribution camera heights. Existing methods often rely on Plucker embeddings, image transformations or data augmentation. This paper takes a step towards this understudied problem by first investigating the impact of camera height variations on state-of-the-art (SoTA) Mono3D models. With a systematic analysis on the extended CARLA dataset with multiple camera heights, we observe that depth estimation is a primary factor influencing performance under height variations. We mathematically prove and also empirically observe consistent negative and positive trends in mean depth error of regressed and ground-based depth models, respectively, under camera height changes. To mitigate this, we propose Camera Height Robust Monocular 3D Detector (CHARM3R), which averages both depth estimates within the model. CHARM3R improves generalization to unseen camera heights by more than $45%$, achieving SoTA performance on the CARLA dataset. Codes and Models at https://github.com/abhi1kumar/CHARM3R
Problem

Research questions and friction points this paper is trying to address.

Addresses monocular 3D detector failure at unseen camera heights
Investigates camera height impact on depth estimation performance
Proposes robust model averaging depth estimates for generalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Averages regressed and ground-based depth estimates
Improves generalization to unseen camera heights
Achieves state-of-the-art performance on CARLA dataset
πŸ”Ž Similar Papers
No similar papers found.
A
Abhinav Kumar
Michigan State University
Yuliang Guo
Yuliang Guo
Bosch Research North America - Formerly @ Baidu Apollo, Brown University (PhD)
Computer Vision3D VisionPhysical AI
Z
Zhihao Zhang
Michigan State University
X
Xinyu Huang
Bosch Research North America, Bosch Center for AI
Liu Ren
Liu Ren
VP and Chief Scientist, Scalable and Assistive AI, ADAS AI, Bosch Research North America (BCAI)
Artificial IntelligenceADAS AIVisual AnalyticsComputer Vision
X
Xiaoming Liu
Michigan State University