๐ค AI Summary
This work addresses the challenge of high-frequency leakage and grid-like periodic artifacts in one-step real-image super-resolution using Diffusion Transformers (DiT), which arise from trajectory misalignment during distillation. To mitigate these issues, the authors propose the StrSR framework, which innovatively integrates adversarial distillation, spectral regularization, and trajectory regularization. Specifically, an asymmetric discriminator-based distillation architecture aligns the teacherโstudent trajectories, while a frequency-domain distribution matching strategy suppresses periodic artifacts. This approach effectively reduces trajectory deviation and spectral distortion inherent in DiT-based one-step super-resolution, achieving state-of-the-art performance in both quantitative metrics and perceptual visual quality on real-image super-resolution benchmarks.
๐ Abstract
Diffusion transformer (DiT) architectures show great potential for real-world image super-resolution (Real-ISR). However, their computationally expensive iterative sampling necessitates one-step distillation. Existing one-step distillation methods struggle with Real-ISR on DiT. They suffer from fundamental trajectory mismatch and generate severe grid-like periodic artifacts. To tackle these challenges, we propose StrSR, a novel one-step adversarial distillation framework featuring spectral and trajectory regularization. Specifically, we propose an asymmetric discriminative distillation architecture to bridge the trajectory gap. Additionally, we design a frequency distribution matching strategy to effectively suppress DiT-specific periodic artifacts caused by high-frequency spectral leakage. Extensive experiments demonstrate that StrSR achieves state-of-the-art performance in Real-ISR, across both quantitative metrics and visual perception. The code and models will be released at https://github.com/jkwang28/StrSR .