SeG-SR: Integrating Semantic Knowledge into Remote Sensing Image Super-Resolution via Vision-Language Model

📅 2025-05-29

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address reconstruction artifacts in remote sensing image super-resolution (RSISR) caused by neglecting high-level semantic information, this paper proposes SeG-SR, a semantic-guided super-resolution framework. It introduces visual language models (VLMs) to RSISR for the first time, integrating three core components: semantic feature extraction, semantic localization, and learnable semantic modulation—enabling end-to-end, scene-level semantic guidance of low-level reconstruction. The framework is backbone-agnostic, supporting cross-architecture adaptation and multi-scale feature fusion. Evaluated on two benchmark remote sensing datasets, SeG-SR achieves state-of-the-art performance, with significant improvements in PSNR and SSIM. Qualitatively, reconstructed images exhibit both high fidelity and semantic plausibility. The source code is publicly available.

Technology Category

Application Category

📝 Abstract

High-resolution (HR) remote sensing imagery plays a vital role in a wide range of applications, including urban planning and environmental monitoring. However, due to limitations in sensors and data transmission links, the images acquired in practice often suffer from resolution degradation. Remote Sensing Image Super-Resolution (RSISR) aims to reconstruct HR images from low-resolution (LR) inputs, providing a cost-effective and efficient alternative to direct HR image acquisition. Existing RSISR methods primarily focus on low-level characteristics in pixel space, while neglecting the high-level understanding of remote sensing scenes. This may lead to semantically inconsistent artifacts in the reconstructed results. Motivated by this observation, our work aims to explore the role of high-level semantic knowledge in improving RSISR performance. We propose a Semantic-Guided Super-Resolution framework, SeG-SR, which leverages Vision-Language Models (VLMs) to extract semantic knowledge from input images and uses it to guide the super resolution (SR) process. Specifically, we first design a Semantic Feature Extraction Module (SFEM) that utilizes a pretrained VLM to extract semantic knowledge from remote sensing images. Next, we propose a Semantic Localization Module (SLM), which derives a series of semantic guidance from the extracted semantic knowledge. Finally, we develop a Learnable Modulation Module (LMM) that uses semantic guidance to modulate the features extracted by the SR network, effectively incorporating high-level scene understanding into the SR pipeline. We validate the effectiveness and generalizability of SeG-SR through extensive experiments: SeG-SR achieves state-of-the-art performance on two datasets and consistently delivers performance improvements across various SR architectures. Codes can be found at https://github.com/Mr-Bamboo/SeG-SR.

Problem

Research questions and friction points this paper is trying to address.

Enhance remote sensing image resolution using semantic knowledge

Address semantic inconsistency in super-resolved remote sensing images

Integrate vision-language models for improved super-resolution performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates vision-language models for semantic knowledge

Uses semantic guidance to modulate SR features

Achieves state-of-the-art performance on datasets

🔎 Similar Papers

No similar papers found.

Authors to Follow