SHARP: Spectrum-aware Highly-dynamic Adaptation for Resolution Promotion in Remote Sensing Synthesis

📅 2026-03-23

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This work addresses the challenges of remote sensing image synthesis, which are hindered by the lack of domain-specific generative priors and the high computational cost of training at high resolutions. Existing training-free super-resolution methods struggle to preserve mid- and high-frequency details due to static positional scaling. To overcome this, the authors propose SHARP: first, they fine-tune FLUX on 100,000 remote sensing images to obtain a domain-specific prior model, RS-FLUX; second, they introduce a spectrum-aware dynamic score-time scheduling function \( k_{\text{rs}}(t) \) that enables diffusion-aligned dynamic positional embeddings within RoPE—emphasizing structural layout in early denoising stages and progressively recovering fine details later. Without additional training, SHARP supports multi-scale high-resolution generation, consistently outperforming existing training-free baselines across six square and rectangular resolutions, with notable gains in CLIP Score, Aesthetic Score, and HPSv2, especially under large upscaling factors, while incurring negligible computational overhead.

Technology Category

Application Category

📝 Abstract

Text-to-image generation powered by Diffusion Transformers (DiTs) has made remarkable strides, yet remote sensing (RS) synthesis lags behind due to two barriers: the absence of a domain-specialized DiT prior and the prohibitive cost of training at the large resolutions that RS applications demand. Training-free resolution promotion via Rotary Position Embedding (RoPE) rescaling offers a practical remedy, but every existing method applies a static positional scaling rule throughout the denoising process. This uniform compression is particularly harmful for RS imagery, whose substantially denser medium- and high-frequency energy encodes the fine structures critical for aerial-scene realism, such as vehicles, building contours, and road markings. Addressing both challenges requires a domain-specialized generative prior coupled with a denoising-aware positional adaptation strategy. To this end, we fine-tune FLUX on over 100,000 curated RS images to build a strong domain prior (RS-FLUX), and propose Spectrum-aware Highly-dynamic Adaptation for Resolution Promotion (SHARP), a training-free method that introduces a rational fractional time schedule k_rs(t) into RoPE. SHARP applies strong positional promotion during the early layout-formation stage and progressively relaxes it during detail recovery, aligning extrapolation strength with the frequency-progressive nature of diffusion denoising. Its resolution-agnostic formulation further enables robust multi-scale generation from a single set of hyperparameters. Extensive experiments across six square and rectangular resolutions show that SHARP consistently outperforms all training-free baselines on CLIP Score, Aesthetic Score, and HPSv2, with widening margins at more aggressive extrapolation factors and negligible computational overhead. Code and weights are available at https://github.com/bxuanz/SHARP.

Problem

Research questions and friction points this paper is trying to address.

remote sensing synthesis

high-resolution generation

diffusion models

positional embedding

frequency-aware adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion Transformers

Remote Sensing Synthesis

Rotary Position Embedding