🤖 AI Summary
Single-image 2.5D relief (depth/normal) estimation suffers from poor robustness under complex materials and lighting conditions, compounded by severe scarcity of authentic ground-truth annotations. Method: This paper proposes a pseudo-real–real collaborative end-to-end learning framework. It innovatively leverages text-to-image diffusion models to synthesize large-scale, diverse pseudo-real image–relief pairs; integrates joint depth-normal modeling with multi-view reconstruction to generate high-fidelity pseudo-labels; and incorporates a small set of real captured data via a progressive training strategy. Contribution/Results: The method achieves state-of-the-art performance on both depth and normal prediction benchmarks, significantly outperforming existing approaches. It demonstrates strong generalization and practical utility in downstream applications—including texture transfer and relighting—validating its effectiveness in realistic scenarios.
📝 Abstract
This paper presents MonoRelief V2, an end-to-end model designed for directly recovering 2.5D reliefs from single images under complex material and illumination variations. In contrast to its predecessor, MonoRelief V1 [1], which was solely trained on synthetic data, MonoRelief V2 incorporates real data to achieve improved robustness, accuracy and efficiency. To overcome the challenge of acquiring large-scale real-world dataset, we generate approximately 15,000 pseudo real images using a text-to-image generative model, and derive corresponding depth pseudo-labels through fusion of depth and normal predictions. Furthermore, we construct a small-scale real-world dataset (800 samples) via multi-view reconstruction and detail refinement. MonoRelief V2 is then progressively trained on the pseudo-real and real-world datasets. Comprehensive experiments demonstrate its state-of-the-art performance both in depth and normal predictions, highlighting its strong potential for a range of downstream applications. Code is at: https://github.com/glp1001/MonoreliefV2.