🤖 AI Summary
This paper addresses the challenge of generating physically consistent shadows from monocular RGB images. Methodologically, it introduces a novel framework that jointly leverages explicit geometric–illumination modeling and diffusion-based priors. First, dense point clouds are estimated from the input image to reconstruct 3D scene geometry, while the dominant light direction is predicted. Second, physically grounded initial shadows are synthesized via ray projection under the estimated geometry and lighting. Finally, this physically informed shadow serves as a conditional input to a diffusion model for detail refinement and photorealistic enhancement. The key contribution lies in the first explicit integration of interpretable, physics-based shadow formation—governed by geometry and illumination—into an end-to-end deep learning pipeline, thereby avoiding physical implausibility inherent in purely data-driven approaches. Experiments on DESOBAV2 demonstrate substantial improvements over state-of-the-art methods in both visual fidelity and physical plausibility, particularly under complex geometries and ambiguous lighting conditions.
📝 Abstract
Shadow generation aims to produce photorealistic shadows that are visually consistent with object geometry and scene illumination. In the physics of shadow formation, the occluder blocks some light rays casting from the light source that would otherwise arrive at the surface, creating a shadow that follows the silhouette of the occluder. However, such explicit physical modeling has rarely been used in deep-learning-based shadow generation. In this paper, we propose a novel framework that embeds explicit physical modeling - geometry and illumination - into deep-learning-based shadow generation. First, given a monocular RGB image, we obtain approximate 3D geometry in the form of dense point maps and predict a single dominant light direction. These signals allow us to recover fairly accurate shadow location and shape based on the physics of shadow formation. We then integrate this physics-based initial estimate into a diffusion framework that refines the shadow into a realistic, high-fidelity appearance while ensuring consistency with scene geometry and illumination. Trained on DESOBAV2, our model produces shadows that are both visually realistic and physically coherent, outperforming existing approaches, especially in scenes with complex geometry or ambiguous lighting.