🤖 AI Summary
To address the performance degradation in face super-resolution (FSR) under computation-constrained scenarios—caused by entangled frequency-domain features and inefficient resource allocation—this paper proposes a dual-path frequency-domain disentanglement architecture. The low-frequency path employs a Mamba-based state space model to enhance skin-tone and coarse-texture representation, while the high-frequency path adopts a CNN backbone integrated with a depth-aware positional attention (DPA) module and a lightweight high-frequency refinement (HFR) module for precise contour and fine-detail modeling. We introduce a novel frequency-aware dual-path coordination mechanism, augmented by squeeze-and-excitation attention to improve channel-wise sensitivity. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art approaches on multiple benchmarks in terms of PSNR and SSIM, while reducing both parameter count and FLOPs substantially. This work is the first to empirically validate that adaptive frequency-domain disentanglement effectively enhances both efficiency and reconstruction quality in FSR.
📝 Abstract
Face super-resolution (FSR) under limited computational costs remains an open problem. Existing approaches typically treat all facial pixels equally, resulting in suboptimal allocation of computational resources and degraded FSR performance. CNN is relatively sensitive to high-frequency facial features, such as component contours and facial outlines. Meanwhile, Mamba excels at capturing low-frequency features like facial color and fine-grained texture, and does so with lower complexity than Transformers. Motivated by these observations, we propose FADPNet, a Frequency-Aware Dual-Path Network that decomposes facial features into low- and high-frequency components and processes them via dedicated branches. For low-frequency regions, we introduce a Mamba-based Low-Frequency Enhancement Block (LFEB), which combines state-space attention with squeeze-and-excitation operations to extract low-frequency global interactions and emphasize informative channels. For high-frequency regions, we design a CNN-based Deep Position-Aware Attention (DPA) module to enhance spatially-dependent structural details, complemented by a lightweight High-Frequency Refinement (HFR) module that further refines frequency-specific representations. Through the above designs, our method achieves an excellent balance between FSR quality and model efficiency, outperforming existing approaches.