🤖 AI Summary
Existing semantic segmentation methods for satellite time-series imagery exhibit limited generalization when confronted with variable temporal sequence lengths, leading to significant performance degradation. To address this challenge, this work proposes TEA, the first approach enabling adaptive segmentation across diverse time lengths. TEA leverages a teacher model to perform multi-view knowledge distillation onto a student model that accepts adaptive input lengths, transferring knowledge through intermediate embeddings, prototypes, and soft labels. Additionally, it incorporates a dynamic aggregation strategy and an auxiliary full-sequence reconstruction task to enhance temporal modeling. Extensive experiments on multiple benchmark datasets demonstrate that TEA substantially improves segmentation accuracy and exhibits strong robustness and generalization under varying temporal sequence lengths.
📝 Abstract
Crop mapping based on satellite images time-series (SITS) holds substantial economic value in agricultural production settings, in which parcel segmentation is an essential step. Existing approaches have achieved notable advancements in SITS segmentation with predetermined sequence lengths. However, we found that these approaches overlooked the generalization capability of models across scenarios with varying temporal length, leading to markedly poor segmentation results in such cases. To address this issue, we propose TEA, a TEmporal Adaptive SITS semantic segmentation method to enhance the model's resilience under varying sequence lengths. We introduce a teacher model that encapsulates the global sequence knowledge to guide a student model with adaptive temporal input lengths. Specifically, teacher shapes the student's feature space via intermediate embedding, prototypes and soft label perspectives to realize knowledge transfer, while dynamically aggregating student model to mitigate knowledge forgetting. Finally, we introduce full-sequence reconstruction as an auxiliary task to further enhance the quality of representations across inputs of varying temporal lengths. Through extensive experiments, we demonstrate that our method brings remarkable improvements across inputs of different temporal lengths on common benchmarks. Our code will be publicly available.