🤖 AI Summary
This work addresses the challenges of excessive local noise, detail loss, and insufficient exploitation of global context in low-light image and video enhancement. To this end, we propose a lightweight Row-Column Separated Attention (RCSA) module, integrated into an enhanced U-Net architecture. RCSA efficiently captures global contextual information by aggregating row-wise and column-wise mean and maximum statistics from feature maps, thereby guiding localized enhancement while significantly reducing computational overhead. Furthermore, we extend this design to video enhancement for the first time and introduce two novel temporal consistency loss functions to ensure smooth inter-frame coherence. Extensive experiments on the LOL, MIT-Adobe FiveK, and SDSD datasets demonstrate state-of-the-art performance, and the code is publicly released.
📝 Abstract
U‐Net structure is widely used for low‐light image/video enhancement. The enhanced images result in areas with large local noise and loss of more details without proper guidance for global information. Attention mechanisms can better focus on and use global information. However, attention to images could significantly increase the number of parameters and computations. We propose a Row–Column Separated Attention module (RCSA) inserted after an improved U‐Net. The RCSA module's input is the mean and maximum of the row and column of the feature map, which utilizes global information to guide local information with fewer parameters. We propose two temporal loss functions to apply the method to low‐light video enhancement and maintain temporal consistency. Extensive experiments on the LOL, MIT Adobe FiveK image, and SDSD video datasets demonstrate the effectiveness of our approach.