AgroBench: Vision-Language Model Benchmark in Agriculture

📅 2025-07-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing vision-language models (VLMs) exhibit near-random performance on fine-grained agricultural recognition tasks—particularly weed classification—due to the lack of domain-specific, expert-annotated multimodal benchmarks. Method: We introduce AgroBench, the first agricultural multimodal evaluation benchmark systematically annotated by agronomy experts. It covers seven major agricultural domains, 203 crop species, and 682 plant disease classes, featuring broad category coverage, fine-grained taxonomy, and high annotation expertise. Built upon AgroBench, we propose an agriculture-oriented VLM evaluation framework supporting fine-grained vision-language understanding and human-AI interaction assessment. Contribution/Results: Empirical evaluation reveals severe limitations of state-of-the-art open-source VLMs in agricultural fine-grained recognition, especially for weed identification (accuracy <20%). All datasets, evaluation protocols, and code are publicly released to establish a rigorous, reproducible benchmark and optimization pathway for agricultural AI development.

Technology Category

Application Category

📝 Abstract
Precise automated understanding of agricultural tasks such as disease identification is essential for sustainable crop production. Recent advances in vision-language models (VLMs) are expected to further expand the range of agricultural tasks by facilitating human-model interaction through easy, text-based communication. Here, we introduce AgroBench (Agronomist AI Benchmark), a benchmark for evaluating VLM models across seven agricultural topics, covering key areas in agricultural engineering and relevant to real-world farming. Unlike recent agricultural VLM benchmarks, AgroBench is annotated by expert agronomists. Our AgroBench covers a state-of-the-art range of categories, including 203 crop categories and 682 disease categories, to thoroughly evaluate VLM capabilities. In our evaluation on AgroBench, we reveal that VLMs have room for improvement in fine-grained identification tasks. Notably, in weed identification, most open-source VLMs perform close to random. With our wide range of topics and expert-annotated categories, we analyze the types of errors made by VLMs and suggest potential pathways for future VLM development. Our dataset and code are available at https://dahlian00.github.io/AgroBenchPage/ .
Problem

Research questions and friction points this paper is trying to address.

Evaluating VLMs for agricultural task accuracy
Assessing fine-grained disease and crop identification
Analyzing VLM errors in weed recognition
Innovation

Methods, ideas, or system contributions that make the work stand out.

Expert-annotated VLM benchmark for agriculture
Covers 203 crops and 682 disease categories
Analyzes VLM errors for future improvements
🔎 Similar Papers
No similar papers found.
Risa Shinoda
Risa Shinoda
The University of Osaka
AgricultureComputer VisionAnimal
N
Nakamasa Inoue
Tokyo Institute of Technology, National Institute of Advanced Industrial Science and Technology (AIST)
Hirokatsu Kataoka
Hirokatsu Kataoka
AIST / University of Oxford
Computer VisionAction RecognitionAction PredictionVisual Pre-trainingFDSL
M
Masaki Onishi
National Institute of Advanced Industrial Science and Technology (AIST)
Y
Yoshitaka Ushiku
OMRON SINIC X