🤖 AI Summary
This work addresses the challenge of deploying diffusion models for remote sensing image super-resolution due to their prohibitive computational overhead. To this end, the authors propose SlimDiffSR, a novel framework that integrates an uncertainty-guided single-step teacher model, structure-aware pruning tailored to remote sensing characteristics, frequency- and orientation-separable convolutions, and a query-driven global aggregation module. Furthermore, maximum mean discrepancy (MMD) is incorporated to enhance knowledge distillation. The proposed method achieves a 200× inference speedup and 20× parameter reduction compared to conventional multi-step diffusion models, while preserving superior perceptual quality and reconstruction performance, thereby substantially improving model efficiency and practical applicability.
📝 Abstract
Diffusion models have recently achieved remarkable performance in image super-resolution (SR), but their high computational cost limits practical deployment in remote sensing applications. To address this issue, we propose SlimDiffSR, a lightweight and efficient diffusion-based framework for real-world remote sensing image super-resolution. Unlike existing single-step diffusion methods that rely on fixed timesteps, we first introduce an uncertainty-guided timestep assignment strategy to construct a stronger single-step teacher model, where reconstruction difficulty is explicitly linked to diffusion timesteps, enabling adaptive generative strength. Building upon this teacher, we further present a structured pruning strategy tailored to remote sensing imagery, which systematically removes redundant semantic modules and replaces standard operations with lightweight designs, including frequency-separable convolution, direction-separable convolution, and a query-driven global aggregation module. These components explicitly exploit the unique characteristics of remote sensing data, such as sparse high-frequency details, strong directional patterns, and long-range spatial dependencies. To enhance knowledge transfer, we incorporate Maximum Mean Discrepancy (MMD) into the distillation process to align feature distributions between the teacher and student models. Extensive experiments on multiple remote sensing benchmarks demonstrate that SlimDiffSR achieves a favorable balance between efficiency and reconstruction quality. In particular, it attains up to $200\times$ inference acceleration and a $20\times$ reduction in model parameters compared with multi-step diffusion models, while achieving competitive perceptual quality and clearly outperforming existing lightweight diffusion baselines in efficiency. The code is available at: https://github.com/wwangcece/SlimDiffSR.