π€ AI Summary
This study addresses the poor generalization of agricultural vision models in real-world field conditions, often caused by distribution shifts between training and deployment dataβa challenge overlooked by existing benchmarks that neglect the impact of data collection diversity. To this end, the authors construct a heterogeneous dataset comprising 50,673 tree images independently collected by 12 teams and propose a cross-team validation (CTV) evaluation framework, employing TOTO (Train-One, Test-One) and LOTO (Leave-One-Team-Out) protocols to systematically assess cross-source generalization. Experiments reveal that models trained on a single source suffer accuracy drops of up to 16.20%, whereas multi-source collaborative training reduces the generalization gap to within 1.78%. This work establishes the first data-centric agricultural AI competition framework, highlighting the critical role of data collection diversity in model robustness, and releases the first public agricultural vision benchmark dataset.
π Abstract
Machine learning models in agricultural vision often achieve high accuracy on curated datasets but fail to generalize under real field conditions due to distribution shifts between training and deployment environments. Moreover, most machine learning competitions focus primarily on model design while treating datasets as fixed resources, leaving the role of data collection practices in model generalization largely unexplored. We introduce the AgrI Challenge, a data-centric competition framework in which multiple teams independently collect field datasets, producing a heterogeneous multi-source benchmark that reflects realistic variability in acquisition conditions. To systematically evaluate cross-domain generalization across independently collected datasets, we propose Cross-Team Validation (CTV), an evaluation paradigm that treats each team's dataset as a distinct domain. CTV includes two complementary protocols: Train-on-One-Team-Only (TOTO), which measures single-source generalization, and Leave-One-Team-Out (LOTO), which evaluates collaborative multi-source training. Experiments reveal substantial generalization gaps under single-source training: models achieve near-perfect validation accuracy yet exhibit validation-test gaps of up to 16.20% (DenseNet121) and 11.37% (Swin Transformer) when evaluated on datasets collected by other teams. In contrast, collaborative multi-source training dramatically improves robustness, reducing the gap to 2.82% and 1.78%, respectively. The challenge also produced a publicly available dataset of 50,673 field images of six tree species collected by twelve independent teams, providing a diverse benchmark for studying domain shift and data-centric learning in agricultural vision.