🤖 AI Summary
Low-light video enhancement (LLVE) suffers from severe noise, low contrast, and color degradation, while existing learning-based methods struggle to model long-term temporal dependencies effectively. To address this, we propose the Dynamic Weighting Temporal Aggregation Network (DW-TAN), a two-stage framework that jointly models short- and long-term temporal information. First, a visual state-space module performs multi-frame optical-flow-guided alignment; second, a dynamic weighting fusion mechanism adaptively aggregates temporal features. Additionally, we introduce an optical-flow-guided recurrent refinement module and a texture-aware loss function to suppress noise and artifacts while preserving stability in smooth regions and fidelity of fine textures. Evaluated on real-world low-light video datasets, DW-TAN achieves significant improvements over state-of-the-art methods, particularly excelling in dynamic scenes with superior temporal consistency and structural clarity.
📝 Abstract
Low-light video enhancement (LLVE) is challenging due to noise, low contrast, and color degradations. Learning-based approaches offer fast inference but still struggle with heavy noise in real low-light scenes, primarily due to limitations in effectively leveraging temporal information. In this paper, we address this issue with DWTA-Net, a novel two-stage framework that jointly exploits short- and long-term temporal cues. Stage I employs Visual State-Space blocks for multi-frame alignment, recovering brightness, color, and structure with local consistency. Stage II introduces a recurrent refinement module with dynamic weight-based temporal aggregation guided by optical flow, adaptively balancing static and dynamic regions. A texture-adaptive loss further preserves fine details while promoting smoothness in flat areas. Experiments on real-world low-light videos show that DWTA-Net effectively suppresses noise and artifacts, delivering superior visual quality compared with state-of-the-art methods.