🤖 AI Summary
Existing plant disease identification models struggle to generalize across crops, pathogens, and field conditions due to the absence of standardized large-scale annotated datasets and structured symptom knowledge. To address this, this work presents the largest plant disease image–symptom dataset to date and introduces a training-free autonomous visual reasoning agent for interpretable, zero-shot disease diagnosis. The proposed method integrates crop-specific symptom knowledge through an automated data pipeline, web-sourced symptom knowledge extraction via citation grounding, visual-language model–driven anatomical context recognition, and reference image comparison. Experimental results demonstrate a consistent performance gain across four crops, with an average accuracy improvement of 16.2 percentage points when full reference resources are available.
📝 Abstract
Plant disease diagnosis is critical for food security, yet training disease-recognition models that generalize across crops, pathogens, and field conditions remains challenging because labeled disease images are far less abundant and standardized than data for other biotic stresses such as insects or weeds. Frontier vision-language models offer new opportunities through improved visual reasoning, but they still struggle with fine-grained disease identification due to the lack of structured, crop-specific symptom knowledge. To address this gap, we curate the largest plant disease image--symptom dataset to date, covering 335 crops, 1{,}251 disease classes, and approximately 839K images, designed to support training-free, agentic disease prediction. A scalable automated pipeline generates source-grounded symptom descriptions in which each claim is linked to a verbatim web quote; domain experts validate sampled crops and reconcile disease-name variants across sources. As a baseline, we introduce an autonomous visual reasoning agent that identifies anatomical context, narrows candidate diseases using symptom knowledge, sequentially compares reference images, and produces a fully explainable reasoning trace. Incorporating symptom knowledge improves accuracy by 16.2 percentage points on average at the full reference budget, with consistent gains across all four evaluation crops. Because the framework only requires crop-specific reference images and symptom knowledge, it can be extended to new crops without retraining, while the agentic baseline can directly benefit from future improvements in foundation model capabilities. Dataset and code are available at:https://sage-dataset.github.io/.