Leveraging Vision Language Models for Specialized Agricultural Tasks

📅 2024-07-29
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Agricultural applications face severe challenges due to scarce labeled data and difficulty in identifying plant stress phenotypes. Method: This work introduces AgEval—the first agriculture-specific visual language model (VLM) evaluation benchmark—covering 12 stress categories and systematically assessing zero-shot and few-shot (1–8 examples) generalization of leading VLMs (e.g., Claude, GPT, Gemini, LLaVA). We propose a novel coefficient-of-variation (CV)-based metric to quantify cross-category performance disparity. Contribution/Results: We reveal significant category bias in agricultural VLMs (CV = 26.02%–58.03%), the first such finding. Precise class-specific exemplars improve average F1 by 15.38%; the best-performing model achieves 73.37% F1 under the 8-shot setting—a 27.13% relative gain. AgEval establishes a new, reproducible paradigm for evaluating agriculture-oriented VLMs.

Technology Category

Application Category

📝 Abstract
As Vision Language Models (VLMs) become increasingly accessible to farmers and agricultural experts, there is a growing need to evaluate their potential in specialized tasks. We present AgEval, a comprehensive benchmark for assessing VLMs' capabilities in plant stress phenotyping, offering a solution to the challenge of limited annotated data in agriculture. Our study explores how general-purpose VLMs can be leveraged for domain-specific tasks with only a few annotated examples, providing insights into their behavior and adaptability. AgEval encompasses 12 diverse plant stress phenotyping tasks, evaluating zero-shot and few-shot in-context learning performance of state-of-the-art models including Claude, GPT, Gemini, and LLaVA. Our results demonstrate VLMs' rapid adaptability to specialized tasks, with the best-performing model showing an increase in F1 scores from 46.24% to 73.37% in 8-shot identification. To quantify performance disparities across classes, we introduce metrics such as the coefficient of variation (CV), revealing that VLMs' training impacts classes differently, with CV ranging from 26.02% to 58.03%. We also find that strategic example selection enhances model reliability, with exact category examples improving F1 scores by 15.38% on average. AgEval establishes a framework for assessing VLMs in agricultural applications, offering valuable benchmarks for future evaluations. Our findings suggest that VLMs, with minimal few-shot examples, show promise as a viable alternative to traditional specialized models in plant stress phenotyping, while also highlighting areas for further refinement. Results and benchmark details are available at: https://github.com/arbab-ml/AgEval
Problem

Research questions and friction points this paper is trying to address.

Evaluate VLMs' potential in specialized agricultural tasks.
Address limited annotated data in plant stress phenotyping.
Assess VLMs' adaptability with few-shot learning in agriculture.
Innovation

Methods, ideas, or system contributions that make the work stand out.

AgEval benchmark for VLM evaluation
Few-shot learning for plant phenotyping
Strategic example selection enhances reliability
🔎 Similar Papers
No similar papers found.
M
Muhammad Arbab Arshad
Iowa State University, USA
T
T. Jubery
Iowa State University, USA
T
Tirtho Roy
Iowa State University, USA
R
Rim Nassiri
Iowa State University, USA
Asheesh K. Singh
Asheesh K. Singh
Professor, Iowa State University
cultivar developmentplant breedingcyber-agricultural systemsplant phenomicsplant genetics
Arti Singh
Arti Singh
Department of Agronomy, Iowa State University of Science and Technology
Plant-based protein crop breedingPhenomicsHTPMachine LearningData Science
Chinmay Hegde
Chinmay Hegde
New York University
AI
B
B. Ganapathysubramanian
Iowa State University, USA
Aditya Balu
Aditya Balu
Iowa State University
A
A. Krishnamurthy
Iowa State University, USA
S
Soumik Sarkar
Iowa State University, USA