🤖 AI Summary
This work addresses the limitations of existing shadow removal methods, which often lack physical interpretability and struggle to simultaneously recover fine local textures while preserving global illumination consistency. To this end, the authors propose the CFSR framework, which formulates shadow removal as a physically constrained restoration process. Operating in the HVI color space, CFSR integrates 3D geometric cues (surface normals) with semantic priors derived from a frozen CLIP encoder and DINO features. The method introduces a geometry- and semantics-guided dual-attention mechanism and incorporates a frequency-domain collaborative reconstruction module to explicitly enforce illumination constraints. Extensive experiments demonstrate that CFSR achieves state-of-the-art performance across multiple challenging benchmarks, significantly improving both visual fidelity and physical plausibility.
📝 Abstract
Traditional shadow removal networks often treat image restoration as an unconstrained mapping, lacking the physical interpretability required to balance localized texture recovery with global illumination consistency. To address this, we propose CFSR, a multi-modal prior-driven framework that reframes shadow removal as a physics-constrained restoration process. By seamlessly integrating 3D geometric cues with large-scale foundation model semantics, CFSR effectively bridges the 2D-3D domain gap. Specifically, we first map observations into a custom HVI color space to suppress shadow-induced noise and robustly fuse RGB data with estimated depth priors. At its core, our Geometric & Semantic Dual Explicit Guided Attention mechanism utilizes DINO features and 3D surface normals to directly modulate the attention affinity matrix, structurally enforcing physical lighting constraints. To recover severely degraded regions, we inject holistic priors via a frozen CLIP encoder. Finally, our Frequency Collaborative Reconstruction Module (FCRM) achieves an optimal synthesis by decoupling the decoding process. Conditioned on geometric priors, FCRM seamlessly harmonizes the reconstruction of sharp high-frequency occlusion boundaries with the restoration of low-frequency global illumination. Extensive experiments demonstrate that CFSR achieves state-of-the-art performance across multiple challenging benchmarks.