Re-Depth Anything: Test-Time Depth Refinement via Self-Supervised Re-lighting

📅 2025-12-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Monocular depth estimation suffers from poor out-of-distribution generalization in real-world scenes, with state-of-the-art models (e.g., Depth Anything V2) exhibiting significant accuracy degradation. To address this, we propose a test-time self-supervised depth refinement framework that operates without ground-truth depth labels. First, we leverage diffusion models for relighting and input augmentation, incorporating Shape-from-Shading priors. Second, we integrate joint geometric-photometric modeling into Score Distillation Sampling (SDS). Third, we adopt a targeted optimization strategy: freezing the backbone encoder while fine-tuning only the decoder and optimizing intermediate implicit embeddings—thereby preventing optimization collapse. Evaluated across multiple benchmarks, our method substantially improves depth accuracy and geometric consistency, significantly outperforming DA-V2. This demonstrates the efficacy of generative self-supervision in enhancing geometric reasoning for monocular depth estimation.

Technology Category

Application Category

📝 Abstract
Monocular depth estimation remains challenging as recent foundation models, such as Depth Anything V2 (DA-V2), struggle with real-world images that are far from the training distribution. We introduce Re-Depth Anything, a test-time self-supervision framework that bridges this domain gap by fusing DA-V2 with the powerful priors of large-scale 2D diffusion models. Our method performs label-free refinement directly on the input image by re-lighting predicted depth maps and augmenting the input. This re-synthesis method replaces classical photometric reconstruction by leveraging shape from shading (SfS) cues in a new, generative context with Score Distillation Sampling (SDS). To prevent optimization collapse, our framework employs a targeted optimization strategy: rather than optimizing depth directly or fine-tuning the full model, we freeze the encoder and only update intermediate embeddings while also fine-tuning the decoder. Across diverse benchmarks, Re-Depth Anything yields substantial gains in depth accuracy and realism over the DA-V2, showcasing new avenues for self-supervision by augmenting geometric reasoning.
Problem

Research questions and friction points this paper is trying to address.

Refines monocular depth estimation for real-world images
Bridges domain gaps using self-supervised test-time refinement
Enhances depth accuracy via generative re-lighting and optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised test-time refinement via re-lighting
Fuses depth model with diffusion priors using SDS
Optimizes only embeddings and decoder, freezes encoder
🔎 Similar Papers
No similar papers found.