Mitigating Bad Ground Truth in Supervised Machine Learning based Crop Classification: A Multi-Level Framework with Sentinel-2 Images

πŸ“… 2025-03-14
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

185K/year
πŸ€– AI Summary
In agricultural remote sensing, crop classification performance is significantly degraded by erroneous ground truth (GT) labelsβ€”such as mislabeling and misidentification. To address this, we propose a multi-stage GT cleaning framework leveraging multi-temporal Sentinel-2 imagery. Our method innovatively integrates unsupervised anomaly detection via farmland feature embedding learning, crop growth profile clustering, and distance-based metric learning, augmented by a False Colour Composite (FCC)-guided visual verification feedback loop to automatically enhance GT reliability. Experimental results demonstrate that training a random forest classifier on the cleaned GT yields up to a 70-percentage-point improvement in F1 score. This work establishes a robust, reliable foundation for high-accuracy crop mapping, agricultural credit risk assessment, and intelligent agro-decision support systems.

Technology Category

Application Category

πŸ“ Abstract
In agricultural management, precise Ground Truth (GT) data is crucial for accurate Machine Learning (ML) based crop classification. Yet, issues like crop mislabeling and incorrect land identification are common. We propose a multi-level GT cleaning framework while utilizing multi-temporal Sentinel-2 data to address these issues. Specifically, this framework utilizes generating embeddings for farmland, clustering similar crop profiles, and identification of outliers indicating GT errors. We validated clusters with False Colour Composite (FCC) checks and used distance-based metrics to scale and automate this verification process. The importance of cleaning the GT data became apparent when the models were trained on the clean and unclean data. For instance, when we trained a Random Forest model with the clean GT data, we achieved upto 70% absolute percentage points higher for the F1 score metric. This approach advances crop classification methodologies, with potential for applications towards improving loan underwriting and agricultural decision-making.
Problem

Research questions and friction points this paper is trying to address.

Addresses crop mislabeling and incorrect land identification in agricultural management.
Proposes a multi-level framework to clean Ground Truth data using Sentinel-2 images.
Improves crop classification accuracy, enhancing loan underwriting and decision-making.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-level GT cleaning framework
Utilizes multi-temporal Sentinel-2 data
Distance-based metrics for automated verification
πŸ”Ž Similar Papers
No similar papers found.
A
Amoolya Shetty
SatSure Analytics India Pvt Ltd
A
Abhijeet Sharma
SatSure Analytics India Pvt Ltd
M
Masthan Venkatesh Ravichandran
SatSure Analytics India Pvt Ltd
W
Wali Gosuvarapalli
SatSure Analytics India Pvt Ltd
S
Sarthak Jain
SatSure Analytics India Pvt Ltd
P
Priyamvada Nanjundiah
SatSure Analytics India Pvt Ltd
U
Ujjal Kr Dutta
SatSure Analytics India Pvt Ltd
Divya Sharma
Divya Sharma
Google
PrivacySecurityMachine learningCausalityUser Experience