Revisiting the Perception-Distortion Trade-off with Spatial-Semantic Guided Super-Resolution

📅 2026-03-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the longstanding trade-off between perceptual quality and fidelity in image super-resolution, where diffusion models often suffer from structural hallucinations and GAN-based approaches lack textural realism. To overcome this limitation, we propose SpaSemSR, a novel framework that introduces a spatial-semantic guided diffusion mechanism. Our approach integrates dual guidance strategies—spatially anchored textual prompts and semantically enhanced visual cues—and incorporates a dedicated spatial-semantic attention module to enable adaptive fusion of these signals. This design effectively mitigates the perception-distortion dilemma, achieving superior balance across multiple benchmarks by simultaneously preserving high-fidelity structures and generating photorealistic textures.

Technology Category

Application Category

📝 Abstract
Image super-resolution (SR) aims to reconstruct high resolution images with both high perceptual quality and low distortion, but is fundamentally limited by the perception-distortion trade-off. GAN-based SR methods reduce distortion but still struggle with realistic fine-grained textures, whereas diffusion-based approaches synthesize rich details but often deviate from the input, hallucinating structures and degrading fidelity. This tension raises a key challenge: how to exploit the powerful generative priors of diffusion models without sacrificing fidelity. To address this, we propose SpaSemSR, a spatial-semantic guided diffusion framework with two complementary guidances. First, spatial-grounded textual guidance integrates object-level spatial cues with semantic prompts, aligning textual and visual structures to reduce distortion. Second, semantic-enhanced visual guidance with a multi-encoder design and semantic degradation constraints unifies multimodal semantic priors, improving perceptual realism under severe degradations. These complementary guidances are adaptively fused into the diffusion process via spatial-semantic attention, suppressing distortion and hallucination while retaining the strengths of diffusion models. Extensive experiments on multiple benchmarks show that SpaSemSR achieves a superior perception-distortion balance, producing both realistic and faithful restorations.
Problem

Research questions and friction points this paper is trying to address.

perception-distortion trade-off
image super-resolution
diffusion models
fidelity
hallucination
Innovation

Methods, ideas, or system contributions that make the work stand out.

spatial-semantic guidance
diffusion models
perception-distortion trade-off
super-resolution
multimodal priors
🔎 Similar Papers
No similar papers found.