🤖 AI Summary
This work addresses the challenges of fine-grained multiclass change detection in remote sensing, where complex scene dynamics and scarce pixel-level annotations hinder accurate monitoring. To this end, the authors propose a Tri-path DINO architecture that leverages the DINOv3 pretrained model as its backbone and constructs a main-auxiliary cooperative three-branch Siamese network to jointly capture semantic-level changes and structural details through complementary feature learning. A multi-scale parallel convolutional attention mechanism is further integrated into the decoder to enhance contextual awareness and interpretability. The method achieves state-of-the-art performance on both the Gaza infrastructure damage assessment benchmark and the SECOND dataset, with Grad-CAM visualizations confirming the effectiveness of the specialized roles assigned to each network path.
📝 Abstract
In remote sensing imagery, multi class change detection (MCD) is crucial for fine grained monitoring, yet it has long been constrained by complex scene variations and the scarcity of detailed annotations. To address this, we propose the Tripath DINO architecture, which adopts a three path complementary feature learning strategy to facilitate the rapid adaptation of pre trained foundation models to complex vertical domains. Specifically, we employ the DINOv3 pre trained model as the backbone feature extraction network to learn coarse grained features. An auxiliary path also adopts a siamese structure, progressively aggregating intermediate features from the siamese encoder to enhance the learning of fine grained features. Finally, a multi scale attention mechanism is introduced to augment the decoder network, where parallel convolutions adaptively capture and enhance contextual information under different receptive fields. The proposed method achieves optimal performance on the MCD task on both the Gaza facility damage assessment dataset (Gaza change) and the classic SECOND dataset. GradCAM visualizations further confirm that the main and auxiliary paths naturally focus on coarse grained semantic changes and fine grained structural details, respectively. This synergistic complementarity provides a robust and interpretable solution for advanced change detection tasks, offering a basis for rapid and accurate damage assessment.