NeXt2Former-CD: Efficient Remote Sensing Change Detection with Modern Vision Architectures

📅 2026-02-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses performance degradation in remote sensing change detection caused by registration residuals, small-object misalignment, and semantic ambiguity. To this end, the authors propose an end-to-end framework that, for the first time, integrates a ConvNeXt encoder pretrained with DINOv3, a deformable attention-based temporal fusion module, and a Mask2Former decoder to achieve robust spatiotemporal feature alignment and fine-grained change segmentation. The method consistently outperforms existing approaches in terms of F1 score and IoU on the LEVIR-CD, WHU-CD, and CDD benchmarks, while maintaining inference latency comparable to state-space models. Notably, it demonstrates significantly enhanced robustness to registration errors and improved sensitivity to small-scale changes, establishing a new paradigm for high-resolution remote sensing change detection.

Technology Category

Application Category

📝 Abstract
State Space Models (SSMs) have recently gained traction in remote sensing change detection (CD) for their favorable scaling properties. In this paper, we explore the potential of modern convolutional and attention-based architectures as a competitive alternative. We propose NeXt2Former-CD, an end-to-end framework that integrates a Siamese ConvNeXt encoder initialized with DINOv3 weights, a deformable attention-based temporal fusion module, and a Mask2Former decoder. This design is intended to better tolerate residual co-registration noise and small object-level spatial shifts, as well as semantic ambiguity in bi-temporal imagery. Experiments on LEVIR-CD, WHU-CD, and CDD datasets show that our method achieves the best results among the evaluated methods, improving over recent Mamba-based baselines in both F1 score and IoU. Furthermore, despite a larger parameter count, our model maintains inference latency comparable to SSM-based approaches, suggesting it is practical for high-resolution change detection tasks.
Problem

Research questions and friction points this paper is trying to address.

change detection
remote sensing
co-registration noise
spatial shift
semantic ambiguity
Innovation

Methods, ideas, or system contributions that make the work stand out.

ConvNeXt
Deformable Attention
Mask2Former
Change Detection
Remote Sensing
🔎 Similar Papers
No similar papers found.