🤖 AI Summary
This work addresses the challenges in remote sensing change detection posed by the limited global modeling capacity of CNNs and the high computational cost of Transformers. To this end, the authors propose ChangeRWKV, an efficient architecture based on the RWKV model that enables parallelization during training while maintaining linear time complexity during inference. The method employs a hierarchical RWKV encoder coupled with a Spatial-Temporal Fusion Module (STFM) to effectively align multi-scale spatial features and capture fine-grained temporal changes. Evaluated on the LEVIR-CD dataset, ChangeRWKV achieves an IoU of 85.46% and an F1 score of 92.16%, significantly reducing both parameter count and FLOPs compared to existing approaches, thereby offering a favorable balance between accuracy and computational efficiency.
📝 Abstract
Existing paradigms for remote sensing change detection are caught in a trade-off: CNNs excel at efficiency but lack global context, while Transformers capture long-range dependencies at a prohibitive computational cost. This paper introduces ChangeRWKV, a new architecture that reconciles this conflict. By building upon the Receptance Weighted Key Value (RWKV) framework, our ChangeRWKV uniquely combines the parallelizable training of Transformers with the linear-time inference of RNNs. Our approach core features two key innovations: a hierarchical RWKV encoder that builds multi-resolution feature representation, and a novel Spatial-Temporal Fusion Module (STFM) engineered to resolve spatial misalignments across scales while distilling fine-grained temporal discrepancies. ChangeRWKV not only achieves state-of-the-art performance on the LEVIR-CD benchmark, with an 85.46% IoU and 92.16% F1 score, but does so while drastically reducing parameters and FLOPs compared to previous leading methods. This work demonstrates a new, efficient, and powerful paradigm for operational-scale change detection. Our code and model are publicly available.