VSDiffusion: Taming Ill-Posed Shadow Generation via Visibility-Constrained Diffusion

📅 2026-03-09

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Generating geometrically consistent and photorealistic cast shadows in image synthesis remains highly challenging due to the ill-posed nature of the problem. This work proposes a two-stage visibility-constrained diffusion framework: the first stage predicts a coarse shadow mask to localize potential shadow regions, while the second stage integrates lighting and depth cues through conditional diffusion to synthesize fine-grained shadows. Key innovations include a dual-path visibility prior mechanism—where a control branch employs shadow-gated cross-attention to provide multi-scale structural guidance—and a learnable soft prior map that reweights the loss in error-prone regions to enhance geometric consistency. Additionally, a high-frequency guidance module is introduced to refine shadow boundaries and texture blending. The method achieves state-of-the-art performance on DESOBAv2, significantly improving both accuracy and realism in shadow generation.

Technology Category

Application Category

📝 Abstract

Generating realistic cast shadows for inserted foreground objects is a crucial yet challenging problem in image composition, where maintaining geometric consistency of shadow and object in complex scenes remains difficult due to the ill-posed nature of shadow formation. To address this issue, we propose VSDiffusion, a visibility-constrained two-stage framework designed to narrow the solution space by incorporating visibility priors. In Stage I, we predict a coarse shadow mask to localize plausible shadow generated regions. And in Stage II, conditional diffusion is performed guided by lighting and depth cues estimated from the composite to generate accurate shadows. In VSDiffusion, we inject visibility priors through two complementary pathways. First, a visibility control branch with shadow-gated cross attention that provides multi-scale structural guidance. Then, a learned soft prior map that reweights training loss in error-prone regions to enhance geometric correction. Additionally, we also introduce high-frequency guided enhancement module to sharpen boundaries and improve texture interaction with the background. Experiments on widely used public DESOBAv2 dataset demonstrated that our proposed VSDiffusion can generate accurate shadow, and establishes new SOTA results across most evaluation metrics.

Problem

Research questions and friction points this paper is trying to address.

shadow generation

image composition

geometric consistency

ill-posed problem

cast shadows

Innovation

Methods, ideas, or system contributions that make the work stand out.

visibility-constrained diffusion

shadow generation

two-stage framework