🤖 AI Summary
To address the low accuracy and poor domain adaptability of vision-language models in precision agriculture for plant disease identification and treatment recommendation, this paper proposes a lightweight vision-language framework integrating prompt engineering and self-consistency mechanisms. Specifically, we design an expert-role prompt template grounded in plant pathology knowledge, implement a cosine-similarity-driven multi-response self-voting mechanism for robust inference, and perform domain-specific fine-tuning and embedding adaptation of PaliGemma for agricultural applications. The framework jointly optimizes disease diagnosis, symptom analysis, and therapeutic recommendation. Evaluated on a maize leaf disease dataset, it achieves 87.8% diagnostic accuracy, and F1 scores of 52.2% and 43.3% for symptom analysis and treatment recommendation, respectively. With significantly reduced parameter count, the model supports real-time deployment on mobile devices, markedly enhancing semantic parsing reliability and decision-making utility in complex field scenarios.
📝 Abstract
Precision agriculture relies heavily on accurate image analysis for crop disease identification and treatment recommendation, yet existing vision-language models (VLMs) often underperform in specialized agricultural domains. This work presents a domain-aware framework for agricultural image processing that combines prompt-based expert evaluation with self-consistency mechanisms to enhance VLM reliability in precision agriculture applications. We introduce two key innovations: (1) a prompt-based evaluation protocol that configures a language model as an expert plant pathologist for scalable assessment of image analysis outputs, and (2) a cosine-consistency self-voting mechanism that generates multiple candidate responses from agricultural images and selects the most semantically coherent diagnosis using domain-adapted embeddings. Applied to maize leaf disease identification from field images using a fine-tuned PaliGemma model, our approach improves diagnostic accuracy from 82.2% to 87.8%, symptom analysis from 38.9% to 52.2%, and treatment recommendation from 27.8% to 43.3% compared to standard greedy decoding. The system remains compact enough for deployment on mobile devices, supporting real-time agricultural decision-making in resource-constrained environments. These results demonstrate significant potential for AI-driven precision agriculture tools that can operate reliably in diverse field conditions.