🤖 AI Summary
This work addresses the challenges of spatial perception limitation and texture hallucination in remote sensing image super-resolution, which arise from highly imbalanced texture distributions—globally random yet locally clustered. To tackle this, the authors propose TexADiff, a novel framework that explicitly models a Relative Texture Density Map (RTDM) and leverages it as triple guidance within a diffusion model: as a spatial conditional input, a loss modulation term, and a dynamic sampling scheduler. This enables adaptive, texture-aware super-resolution reconstruction. Experimental results demonstrate that the method achieves state-of-the-art or competitive performance in quantitative metrics, faithfully recovers high-frequency details, effectively suppresses texture hallucination, and significantly enhances downstream task performance.
📝 Abstract
Generative diffusion priors have recently achieved state-of-the-art performance in natural image super-resolution, demonstrating a powerful capability to synthesize photorealistic details. However, their direct application to remote sensing image super-resolution (RSISR) reveals significant shortcomings. Unlike natural images, remote sensing images exhibit a unique texture distribution where ground objects are globally stochastic yet locally clustered, leading to highly imbalanced textures. This imbalance severely hinders the model's spatial perception. To address this, we propose TexADiff, a novel framework that begins by estimating a Relative Texture Density Map (RTDM) to represent the texture distribution. TexADiff then leverages this RTDM in three synergistic ways: as an explicit spatial conditioning to guide the diffusion process, as a loss modulation term to prioritize texture-rich regions, and as a dynamic adapter for the sampling schedule. These modifications are designed to endow the model with explicit texture-aware capabilities. Experiments demonstrate that TexADiff achieves superior or competitive quantitative metrics. Furthermore, qualitative results show that our model generates faithful high-frequency details while effectively suppressing texture hallucinations. This improved reconstruction quality also results in significant gains in downstream task performance. The source code of our method can be found at https://github.com/ZezFuture/TexAdiff.