FoMo4Wheat: Toward reliable crop vision foundation models with globally curated data

📅 2025-09-08

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

General-purpose pre-trained vision models exhibit insufficient generalization for wheat crop monitoring, primarily due to the complex coupling of fine-grained, highly variable wheat canopy structures and dynamically shifting field conditions. Method: To address this, we introduce the first wheat-specific visual foundation model, trained via self-supervised learning on a Transformer architecture using the ImAg4Wheat dataset—comprising 2.5 million high-resolution wheat images collected across 30 global field sites. Contribution/Results: The model achieves significantly improved cross-site and cross-task transferability across ten diverse field-level visual tasks, spanning both canopy- and organ-level analysis. It consistently outperforms general-domain backbone models (e.g., ViT, ResNet) on all evaluated tasks, establishing a scalable, domain-adapted representation paradigm for crop perception. This work provides a foundational framework for developing specialized agricultural vision models with enhanced robustness and generalizability under real-world farming conditions.

Technology Category

Application Category

📝 Abstract

Vision-driven field monitoring is central to digital agriculture, yet models built on general-domain pretrained backbones often fail to generalize across tasks, owing to the interaction of fine, variable canopy structures with fluctuating field conditions. We present FoMo4Wheat, one of the first crop-domain vision foundation model pretrained with self-supervision on ImAg4Wheat, the largest and most diverse wheat image dataset to date (2.5 million high-resolution images collected over a decade at 30 global sites, spanning >2,000 genotypes and >500 environmental conditions). This wheat-specific pretraining yields representations that are robust for wheat and transferable to other crops and weeds. Across ten in-field vision tasks at canopy and organ levels, FoMo4Wheat models consistently outperform state-of-the-art models pretrained on general-domain dataset. These results demonstrate the value of crop-specific foundation models for reliable in-field perception and chart a path toward a universal crop foundation model with cross-species and cross-task capabilities. FoMo4Wheat models and the ImAg4Wheat dataset are publicly available online: https://github.com/PheniX-Lab/FoMo4Wheat and https://huggingface.co/PheniX-Lab/FoMo4Wheat. The demonstration website is: https://fomo4wheat.phenix-lab.com/.

Problem

Research questions and friction points this paper is trying to address.

General-domain vision models fail to generalize across agricultural tasks

Lack of robust crop-specific representations for variable field conditions

Need for reliable cross-task and cross-species crop vision models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised pretraining on large crop-specific dataset

Wheat-focused foundation model with cross-crop transferability

Outperforms general-domain models in agricultural vision tasks

🔎 Similar Papers

No similar papers found.