MegaSR: Mining Customized Semantics and Expressive Guidance for Image Super-Resolution

πŸ“… 2025-03-11
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing text-guided image super-resolution (SR) methods suffer from two key limitations: (1) coarse- and single-grained semantic guidance that neglects multi-scale structural characteristics and fine-grained semantic requirements of images; and (2) reliance on monolithic, abstract textual descriptions, leading to structural distortion and semantically impoverished reconstructions. To address these, we propose MegaSRβ€”the first framework to introduce patch-level customized semantic mining, enabling image-attribute-driven dynamic semantic injection. We design a multi-stage explicit guidance aggregation strategy and empirically validate HED edges, depth maps, and segmentation masks as optimal complementary geometric and semantic priors. Leveraging a text-to-image diffusion backbone, MegaSR integrates multimodal guidance modulation with stage-wise feature aggregation. Extensive experiments on multiple Real-ISR benchmarks demonstrate substantial improvements in semantic richness and structural fidelity, consistently outperforming state-of-the-art methods both quantitatively and qualitatively.

Technology Category

Application Category

πŸ“ Abstract
Pioneering text-to-image (T2I) diffusion models have ushered in a new era of real-world image super-resolution (Real-ISR), significantly enhancing the visual perception of reconstructed images. However, existing methods typically integrate uniform abstract textual semantics across all blocks, overlooking the distinct semantic requirements at different depths and the fine-grained, concrete semantics inherently present in the images themselves. Moreover, relying solely on a single type of guidance further disrupts the consistency of reconstruction. To address these issues, we propose MegaSR, a novel framework that mines customized block-wise semantics and expressive guidance for diffusion-based ISR. Compared to uniform textual semantics, MegaSR enables flexible adaptation to multi-granularity semantic awareness by dynamically incorporating image attributes at each block. Furthermore, we experimentally identify HED edge maps, depth maps, and segmentation maps as the most expressive guidance, and propose a multi-stage aggregation strategy to modulate them into the T2I models. Extensive experiments demonstrate the superiority of MegaSR in terms of semantic richness and structural consistency.
Problem

Research questions and friction points this paper is trying to address.

Addresses uniform semantic integration in image super-resolution.
Enhances semantic richness and structural consistency in reconstructed images.
Proposes multi-granularity semantic awareness and expressive guidance.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Customized block-wise semantics for ISR
Multi-granularity semantic awareness adaptation
Multi-stage aggregation of expressive guidance
πŸ”Ž Similar Papers
No similar papers found.
X
Xinrui Li
Harbin Institute of Technology, Shenzhen
Jianlong Wu
Jianlong Wu
Professor, Harbin Institute of Technology (Shenzhen)
Computer VisionMultimodal Learning
X
Xinchuan Huang
Terminus Group
C
Chong Chen
Terminus Group
W
Weili Guan
Harbin Institute of Technology, Shenzhen
X
Xiansheng Hua
Terminus Group
L
Liqiang Nie
Harbin Institute of Technology, Shenzhen