Multigrain-aware Semantic Prototype Scanning and Tri-Token Prompt Learning Embraced High-Order RWKV for Pan-Sharpening

📅 2026-04-16
📈 Citations: 0
Influential: 0
📄 PDF

career value

199K/year
🤖 AI Summary
This work addresses the limitations of conventional RWKV in remote sensing image fusion, which relies on semantically agnostic raster scanning and thus struggles to model multi-granularity semantic structures while being susceptible to positional bias. To overcome these issues, the authors propose a multi-granularity semantic prototype scanning paradigm that integrates a high-order RWKV architecture with a triadic prompt learning framework. Specifically, semantic prototype tokens are generated via locality-sensitive hashing–guided clustering, and a three-component prompting mechanism—comprising global, prototype, and register tokens—is introduced. Additionally, an invertible multi-scale Q-shift operation is devised to enhance high-frequency detail modeling without expanding the receptive field. The proposed method achieves significant performance gains over state-of-the-art approaches across multiple remote sensing benchmarks, simultaneously improving spatial resolution and spectral fidelity.

Technology Category

Application Category

📝 Abstract
In this work, we propose a Multigrain-aware Semantic Prototype Scanning paradigm for pan-sharpening, built upon a high-order RWKV architecture and a tri-token prompting mechanism derived from semantic clustering. Specifically, our method contains three key components: 1) Multigrain-aware Semantic Prototype Scanning. Although RWKV offers a efficient linear-complexity alternative to Transformers, its conventional bidirectional raster scanning is still semantic-agnostic and prone to positional bias. To address this issue, we introduce a semantic-driven scanning strategy that leverages locality-sensitive hashing to group semantically related regions and construct multi-grain semantic prototypes, enabling context-aware token reordering and more coherent global interaction. 2) Tri-token Prompt Learning. We design a tri-token prompting mechanism consisting of a global token, cluster-derived prototype tokens, and a learnable register token. The global and prototype tokens provide complementary semantic priors for RWKV modeling, while the register token helps suppress noisy and artifact-prone intermediate representations. 3) Invertible Q-Shift. To counteract spatial details, we apply center difference convolution on the value pathway to inject high-frequency information, and introduce an invertible multi-scale Q-shift operation for efficient and lossless feature transformation without parameter-heavy receptive field expansion. Experimental results demonstrate the superiority of our method.
Problem

Research questions and friction points this paper is trying to address.

pan-sharpening
semantic-agnostic scanning
positional bias
multi-grain semantics
context-aware modeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multigrain-aware Semantic Prototype Scanning
Tri-Token Prompt Learning
High-Order RWKV
Invertible Q-Shift
Pan-Sharpening