Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator

📅 2025-02-26

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

In monocular depth estimation, global normalization amplifies pseudo-label noise, severely limiting knowledge distillation performance. To address this, we propose a cross-context distillation framework with multi-teacher collaboration. First, we systematically analyze how depth normalization affects pseudo-label quality. Second, we design a cross-scale context modeling mechanism that jointly leverages global and local depth cues to enhance pseudo-label robustness. Third, we introduce a complementary multi-teacher distillation paradigm to mitigate the generalization bottleneck of single-teacher models. Our method pioneers cross-context distillation—explicitly bridging contextual information across scales for improved pseudo-label fidelity. Evaluated on NYUv2 and KITTI benchmarks, it achieves state-of-the-art performance, reducing AbsRel by 12.3% over prior methods. Qualitative results further confirm substantial improvements in depth detail preservation and boundary accuracy.

Technology Category

Application Category

📝 Abstract

Monocular depth estimation (MDE) aims to predict scene depth from a single RGB image and plays a crucial role in 3D scene understanding. Recent advances in zero-shot MDE leverage normalized depth representations and distillation-based learning to improve generalization across diverse scenes. However, current depth normalization methods for distillation, relying on global normalization, can amplify noisy pseudo-labels, reducing distillation effectiveness. In this paper, we systematically analyze the impact of different depth normalization strategies on pseudo-label distillation. Based on our findings, we propose Cross-Context Distillation, which integrates global and local depth cues to enhance pseudo-label quality. Additionally, we introduce a multi-teacher distillation framework that leverages complementary strengths of different depth estimation models, leading to more robust and accurate depth predictions. Extensive experiments on benchmark datasets demonstrate that our approach significantly outperforms state-of-the-art methods, both quantitatively and qualitatively.

Problem

Research questions and friction points this paper is trying to address.

Enhances monocular depth estimation accuracy

Improves depth normalization for distillation

Integrates multi-teacher framework for robustness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-Context Distillation

multi-teacher distillation framework

enhances pseudo-label quality

🔎 Similar Papers

Manydepth2: Motion-Aware Self-Supervised Multi-Frame Monocular Depth Estimation in Dynamic Scenes

2023-12-23Citations: 2

Bosch Group

Renningen, BW, DE

Research Scientist Intern, Multimodal Generative AI and Robotics (PhD)