Large-Small Model Collaboration for Farmland Semantic Change Detection

📅 2026-05-12

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Existing semantic change detection in agricultural fields is hindered by the lack of fine-grained “from-to” annotations and susceptibility to pseudo-changes caused by practices such as crop rotation. To address these limitations, this work introduces HZNU-FCD, a fine-grained benchmark dataset comprising 4,588 image pairs, along with a unified annotation protocol categorizing five types of farmland-to-non-farmland transitions. The authors propose a collaborative large–small model framework: a compact FD-Mamba model learns detailed change representations, while a larger CMLA model leverages CLIP’s textual priors for semantic arbitration and suppression of pseudo-changes, with both models co-trained on hard example regions. Despite employing only 6.65 million trainable parameters, the method achieves an F1 score of 97.63% on HZNU-FCD—surpassing ChangeCLIP-ViT by 10.19 percentage points—and demonstrates strong generalization on LEVIR-CD and WHU-CD.

📝 Abstract

Farmland Semantic Change Detection (SCD) is essential for cultivated land protection, yet existing benchmarks and models remain insufficient for fine-grained farmland conversion monitoring. Current datasets often lack dedicated "from-to" annotations, while visual change detection models are easily disturbed by phenology-induced pseudo-changes caused by crop rotation, seasonal variation, and illumination differences. To address these challenges, we construct HZNU-FCD, a large-scale fine-grained farmland SCD benchmark with a unified five-class farmland-to-non-farmland annotation protocol. It contains 4,588 bitemporal image pairs with pixel-level labels for practical farmland protection. Based on this benchmark, we propose a large-small collaborative SCD framework that integrates a task-driven small visual model with a frozen large vision-language model. The small model, Fine-grained Difference-aware Mamba (FD-Mamba), learns dense change representations for boundary preservation and small-region localization. The large-model pathway, Cross-modal Logical Arbitration (CMLA), introduces CLIP-based textual priors for prompt-guided semantic arbitration and pseudo-change suppression. To enable effective collaboration, we design a hard-region co-training strategy that supervises the CMLA semantic score map only on low-confidence pixels. Experiments show that our method achieves 97.63% F1, 96.32% IoU, and 96.35% SCD_IoU_mean on HZNU-FCD with only 6.65M trainable parameters. Compared with the multimodal ChangeCLIP-ViT, which leverages vision-language information for change detection, our method improves F1 by 10.19 percentage points on HZNU-FCD. It also achieves 91.43% F1 and 84.21% IoU on LEVIR-CD, and 93.85% F1 and 88.41% IoU on WHU-CD, demonstrating strong robustness and generalization. The code is available at https://github.com/Lovelymili/FD-Mamba.

Problem

Research questions and friction points this paper is trying to address.

Farmland Semantic Change Detection

pseudo-changes

fine-grained monitoring

bitemporal imagery

annotation protocol

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-Small Model Collaboration

Fine-grained Semantic Change Detection

Vision-Language Model