SAIL: Self-supervised Albedo Estimation from Real Images with a Latent Diffusion Model

📅 2025-05-26

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Intrinsic image decomposition for real-world images faces challenges including the absence of ground-truth albedo annotations, poor generalization of synthetic-data-supervised methods, and persistent reflectance residuals and illumination inconsistency in self-supervised approaches. To address these, this paper proposes the first unsupervised single-view albedo estimation framework leveraging a pre-trained latent diffusion model (LDM). Our method operates end-to-end in the latent space and introduces dual regularization—illumination-dependent and illumination-independent—to explicitly enforce multi-illumination consistency and suppress specular reflections. Crucially, it exploits the LDM’s unconditional relighting prior as a proxy supervision signal for albedo, eliminating the need for any ground-truth labels or synthetic data; only unlabeled multi-illumination images captured online are required. Experiments demonstrate that our approach achieves high-fidelity, illumination-robust albedo estimation with effective specular suppression on real-world scenes, significantly outperforming existing self-supervised and weakly supervised methods in cross-domain generalization.

Technology Category

Application Category

📝 Abstract

Intrinsic image decomposition aims at separating an image into its underlying albedo and shading components, isolating the base color from lighting effects to enable downstream applications such as virtual relighting and scene editing. Despite the rise and success of learning-based approaches, intrinsic image decomposition from real-world images remains a significant challenging task due to the scarcity of labeled ground-truth data. Most existing solutions rely on synthetic data as supervised setups, limiting their ability to generalize to real-world scenes. Self-supervised methods, on the other hand, often produce albedo maps that contain reflections and lack consistency under different lighting conditions. To address this, we propose SAIL, an approach designed to estimate albedo-like representations from single-view real-world images. We repurpose the prior knowledge of a latent diffusion model for unconditioned scene relighting as a surrogate objective for albedo estimation. To extract the albedo, we introduce a novel intrinsic image decomposition fully formulated in the latent space. To guide the training of our latent diffusion model, we introduce regularization terms that constrain both the lighting-dependent and independent components of our latent image decomposition. SAIL predicts stable albedo under varying lighting conditions and generalizes to multiple scenes, using only unlabeled multi-illumination data available online.

Problem

Research questions and friction points this paper is trying to address.

Estimating albedo from real images without labeled data

Overcoming limitations of synthetic data in intrinsic decomposition

Ensuring albedo consistency across varying lighting conditions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised albedo estimation from real images

Latent diffusion model for intrinsic decomposition

Regularization terms for lighting-independent components

🔎 Similar Papers

Latent Intrinsics Emerge from Training to Relight