🤖 AI Summary
To address weak feature representation and insufficient long-range dependency modeling in single-image dehazing, this paper proposes MRFNLN—a lightweight and efficient model. Methodologically, it introduces three key innovations: (1) a Multi-Scale Receptive Field Attention Block (MSFAB) and a Cross-Branch Non-Local Block (CNLB) that jointly capture multi-scale local structures and global contextual information; (2) Detail-Focused Contrastive Regularization (DFCR), which explicitly enforces fidelity of low-level details; and (3) Spatial Pyramid Downsampling (SPDS) to enhance multi-scale feature aggregation efficiency. With only 1.48M parameters, MRFNLN achieves state-of-the-art PSNR and SSIM scores on benchmark datasets including SOTS and O-HAZE, while maintaining superior inference speed and memory efficiency—demonstrating an effective balance among accuracy, computational cost, and resource consumption.
📝 Abstract
Recently, deep learning-based methods have dominated image dehazing domain. Although very competitive dehazing performance has been achieved with sophisticated models, effective solutions for extracting useful features are still under-explored. In addition, non-local network, which has made a breakthrough in many vision tasks, has not been appropriately applied to image dehazing. Thus, a multi-receptive-field non-local network (MRFNLN) consisting of the multi-stream feature attention block (MSFAB) and cross non-local block (CNLB) is presented in this paper. We start with extracting richer features for dehazing. Specifically, we design a multi-stream feature extraction (MSFE) sub-block, which contains three parallel convolutions with different receptive fields (i.e., $1 imes 1$, $3 imes 3$, $5 imes 5$) for extracting multi-scale features. Following MSFE, we employ an attention sub-block to make the model adaptively focus on important channels/regions. The MSFE and attention sub-blocks constitute our MSFAB. Then, we design a cross non-local block (CNLB), which can capture long-range dependencies beyond the query. Instead of the same input source of query branch, the key and value branches are enhanced by fusing more preceding features. CNLB is computation-friendly by leveraging a spatial pyramid down-sampling (SPDS) strategy to reduce the computation and memory consumption without sacrificing the performance. Last but not least, a novel detail-focused contrastive regularization (DFCR) is presented by emphasizing the low-level details and ignoring the high-level semantic information in the representation space. Comprehensive experimental results demonstrate that the proposed MRFNLN model outperforms recent state-of-the-art dehazing methods with less than 1.5 Million parameters.