Bridging Fidelity-Reality with Controllable One-Step Diffusion for Image Super-Resolution

📅 2025-12-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing three key challenges in single-step diffusion-based super-resolution—low fidelity, insufficient activation of generative priors in localized regions, and text–image semantic misalignment—this paper proposes Controllable One-step Diffusion Super-Resolution (CODSR). Methodologically, CODSR introduces a novel lossless feature modulation mechanism guided by low-quality (LQ) inputs to preserve degradation information; designs a region-adaptive generative prior attention module to enhance local discriminability; and incorporates a text–image semantic alignment guidance module for fine-grained prompt-driven control. While retaining the inference efficiency of single-step sampling, CODSR achieves state-of-the-art perceptual quality and attains competitive fidelity metrics (e.g., PSNR and LPIPS). This work establishes a new paradigm for controllable, efficient, and high-fidelity diffusion-based super-resolution.

Technology Category

Application Category

📝 Abstract
Recent diffusion-based one-step methods have shown remarkable progress in the field of image super-resolution, yet they remain constrained by three critical limitations: (1) inferior fidelity performance caused by the information loss from compression encoding of low-quality (LQ) inputs; (2) insufficient region-discriminative activation of generative priors; (3) misalignment between text prompts and their corresponding semantic regions. To address these limitations, we propose CODSR, a controllable one-step diffusion network for image super-resolution. First, we propose an LQ-guided feature modulation module that leverages original uncompressed information from LQ inputs to provide high-fidelity conditioning for the diffusion process. We then develop a region-adaptive generative prior activation method to effectively enhance perceptual richness without sacrificing local structural fidelity. Finally, we employ a text-matching guidance strategy to fully harness the conditioning potential of text prompts. Extensive experiments demonstrate that CODSR achieves superior perceptual quality and competitive fidelity compared with state-of-the-art methods with efficient one-step inference.
Problem

Research questions and friction points this paper is trying to address.

Improves fidelity in image super-resolution by using uncompressed low-quality inputs
Enhances region-specific generative priors for better perceptual richness
Aligns text prompts with corresponding semantic regions accurately
Innovation

Methods, ideas, or system contributions that make the work stand out.

LQ-guided feature modulation for high-fidelity conditioning
Region-adaptive generative prior activation for perceptual richness
Text-matching guidance strategy to align prompts with semantics
🔎 Similar Papers
No similar papers found.
H
Hao Chen
School of Computer Science and Engineering, Nanjing University of Science and Technology
J
Junyang Chen
School of Computer Science and Engineering, Nanjing University of Science and Technology
Jinshan Pan
Jinshan Pan
Nanjing University of Science and Technology
Computer VisionImage ProcessingComputational PhotographyMachine Learning
Jiangxin Dong
Jiangxin Dong
Nanjing University of Science and Technology
Computer VisionMachine LearningImage DeblurringImage RestorationImage Enhancement