🤖 AI Summary
Video shadow detection faces two key challenges: semantic ambiguity between shadows and dark objects under complex backgrounds, and difficulty modeling shadow deformation under dynamic lighting conditions. To address these, we propose a language-guided spatiotemporal disentanglement framework. Our method introduces a vision-language matching module and a dark-region-aware block, enhanced by adaptive mask reweighting and edge-aware supervision to improve discrimination between shadows and dark objects. Additionally, we incorporate learnable temporal tokens and a Temporal Tokenization Block (TTB) to enable efficient spatiotemporal disentanglement. The framework jointly leverages vision-language pretrained features, dark-aware semantic representation, and edge-mask supervision. Evaluated on multiple benchmarks, our approach achieves state-of-the-art performance while supporting real-time inference—demonstrating significant improvements in both detection accuracy and computational efficiency.
📝 Abstract
Video shadow detection confronts two entwined difficulties: distinguishing shadows from complex backgrounds and modeling dynamic shadow deformations under varying illumination. To address shadow-background ambiguity, we leverage linguistic priors through the proposed Vision-language Match Module (VMM) and a Dark-aware Semantic Block (DSB), extracting text-guided features to explicitly differentiate shadows from dark objects. Furthermore, we introduce adaptive mask reweighting to downweight penumbra regions during training and apply edge masks at the final decoder stage for better supervision. For temporal modeling of variable shadow shapes, we propose a Tokenized Temporal Block (TTB) that decouples spatiotemporal learning. TTB summarizes cross-frame shadow semantics into learnable temporal tokens, enabling efficient sequence encoding with minimal computation overhead. Comprehensive Experiments on multiple benchmark datasets demonstrate state-of-the-art accuracy and real-time inference efficiency. Codes are available at https://github.com/city-cheng/DTTNet.