🤖 AI Summary
General-purpose pre-trained vision models exhibit insufficient generalization for wheat crop monitoring, primarily due to the complex coupling of fine-grained, highly variable wheat canopy structures and dynamically shifting field conditions.
Method: To address this, we introduce the first wheat-specific visual foundation model, trained via self-supervised learning on a Transformer architecture using the ImAg4Wheat dataset—comprising 2.5 million high-resolution wheat images collected across 30 global field sites.
Contribution/Results: The model achieves significantly improved cross-site and cross-task transferability across ten diverse field-level visual tasks, spanning both canopy- and organ-level analysis. It consistently outperforms general-domain backbone models (e.g., ViT, ResNet) on all evaluated tasks, establishing a scalable, domain-adapted representation paradigm for crop perception. This work provides a foundational framework for developing specialized agricultural vision models with enhanced robustness and generalizability under real-world farming conditions.
📝 Abstract
Vision-driven field monitoring is central to digital agriculture, yet models built on general-domain pretrained backbones often fail to generalize across tasks, owing to the interaction of fine, variable canopy structures with fluctuating field conditions. We present FoMo4Wheat, one of the first crop-domain vision foundation model pretrained with self-supervision on ImAg4Wheat, the largest and most diverse wheat image dataset to date (2.5 million high-resolution images collected over a decade at 30 global sites, spanning >2,000 genotypes and >500 environmental conditions). This wheat-specific pretraining yields representations that are robust for wheat and transferable to other crops and weeds. Across ten in-field vision tasks at canopy and organ levels, FoMo4Wheat models consistently outperform state-of-the-art models pretrained on general-domain dataset. These results demonstrate the value of crop-specific foundation models for reliable in-field perception and chart a path toward a universal crop foundation model with cross-species and cross-task capabilities. FoMo4Wheat models and the ImAg4Wheat dataset are publicly available online: https://github.com/PheniX-Lab/FoMo4Wheat and https://huggingface.co/PheniX-Lab/FoMo4Wheat. The demonstration website is: https://fomo4wheat.phenix-lab.com/.