๐ค AI Summary
Remote sensing data exhibit strong temporal-spectral-spatial (TSS) heterogeneity, leading to poor model generalization and high task-specific adaptation costs.
Method: This paper proposes a unified dense prediction framework that (i) introduces TSS-decoupled encoding and metadata-driven normalization for robust representation learning; (ii) designs a local-global window attention mechanism to jointly model fine-grained details and global context; and (iii) constructs a plug-and-play unified output head enabling dual decoupling of input configurations and output structures.
Contribution/Results: Without task-specific retraining, the single model supports seamless zero-shot transfer across diverse dense prediction tasksโincluding semantic segmentation and change detection. Evaluated on multiple multi-source remote sensing benchmarks, it achieves state-of-the-art or superior performance, significantly improving cross-task and cross-sensor generalization while enhancing deployment efficiency.
๐ Abstract
The proliferation of diverse remote sensing data has spurred advancements in dense prediction tasks, yet significant challenges remain in handling data heterogeneity. Remote sensing imagery exhibits substantial variability across temporal, spectral, and spatial (TSS) dimensions, complicating unified data processing. Current deep learning models for dense prediction tasks, such as semantic segmentation and change detection, are typically tailored to specific input-output configurations. Consequently, variations in data dimensionality or task requirements often lead to significant performance degradation or model incompatibility, necessitating costly retraining or fine-tuning efforts for different application scenarios. This paper introduces the Temporal-Spectral-Spatial Unified Network (TSSUN), a novel architecture designed for unified representation and modeling of remote sensing data across diverse TSS characteristics and task types. TSSUN employs a Temporal-Spectral-Spatial Unified Strategy that leverages meta-information to decouple and standardize input representations from varied temporal, spectral, and spatial configurations, and similarly unifies output structures for different dense prediction tasks and class numbers. Furthermore, a Local-Global Window Attention mechanism is proposed to efficiently capture both local contextual details and global dependencies, enhancing the model's adaptability and feature extraction capabilities. Extensive experiments on multiple datasets demonstrate that a single TSSUN model effectively adapts to heterogeneous inputs and unifies various dense prediction tasks. The proposed approach consistently achieves or surpasses state-of-the-art performance, highlighting its robustness and generalizability for complex remote sensing applications without requiring task-specific modifications.