🤖 AI Summary
Agricultural field boundary extraction from remote sensing data is hindered by cloud contamination in optical imagery, requiring labor-intensive manual cloud removal and suffering from poor generalizability across varying cloud conditions.
Method: This paper proposes an end-to-end 3D Vision Transformer framework tailored for Sentinel-1 (S1) and Sentinel-2 (S2) time-series data. It introduces a novel memory-efficient attention mechanism that jointly models spatiotemporal dependencies without human intervention, enabling robust fusion of multi-source SAR and optical observations.
Contribution/Results: The method achieves S2-level spatial accuracy using S1 data alone—even under dense cloud cover—marking the first such demonstration. It maintains high stability and accuracy across both sparse and dense cloud scenarios. Deployed in the ePaddocks system, it has enabled nationwide field boundary mapping across Australia, significantly advancing automated, cloud-agnostic cartography for digital agriculture.
📝 Abstract
Accurate field boundary delineation is a critical challenge in digital agriculture, impacting everything from crop monitoring to resource management. Existing methods often struggle with noise and fail to generalize across varied landscapes, particularly when dealing with cloud cover in optical remote sensing. In response, this study presents a new approach that leverages time series data from Sentinel-2 (S2) and Sentinel-1 (S1) imagery to improve performance under diverse cloud conditions, without the need for manual cloud filtering. We introduce a 3D Vision Transformer architecture specifically designed for satellite image time series, incorporating a memory-efficient attention mechanism. Two models are proposed: PTAViT3D, which handles either S2 or S1 data independently, and PTAViT3D-CA, which fuses both datasets to enhance accuracy. Both models are evaluated under sparse and dense cloud coverage by exploiting spatio-temporal correlations. Our results demonstrate that the models can effectively delineate field boundaries, even with partial (S2 or S2 and S1 data fusion) or dense cloud cover (S1), with the S1-based model providing performance comparable to S2 imagery in terms of spatial resolution. A key strength of this approach lies in its capacity to directly process cloud-contaminated imagery by leveraging spatio-temporal correlations in a memory-efficient manner. This methodology, used in the ePaddocks product to map Australia's national field boundaries, offers a robust, scalable solution adaptable to varying agricultural environments, delivering precision and reliability where existing methods falter. Our code is available at https://github.com/feevos/tfcl.