AgriCoT: A Chain-of-Thought Benchmark for Evaluating Reasoning in Vision-Language Models for Agriculture

📅 2025-11-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing VQA benchmarks inadequately evaluate visual language models’ (VLMs) logical reasoning and problem-solving capabilities in complex agricultural scenarios. To address this, we introduce AgriCoT—the first Chain-of-Thought (CoT)-enhanced visual question answering dataset for agriculture, comprising 4,535 samples. AgriCoT is the first to incorporate human-annotated CoT rationales into agricultural VLM evaluation, enabling fine-grained analysis of multimodal understanding and stepwise reasoning. Zero-shot evaluation across 26 state-of-the-art VLMs reveals that while proprietary models achieve higher answer accuracy, they exhibit substantial deficiencies in reasoning coherence and causal logic. This work bridges a critical gap in explainable reasoning assessment for agriculture and advances VLM evaluation from an “answer correctness” paradigm toward a “reasoning validity” paradigm—emphasizing not only what is answered, but how and why.

Technology Category

Application Category

📝 Abstract
Recent advancements in Vision-Language Models (VLMs) have significantly transformed various industries. In agriculture, these dual-modal capabilities offer promising applications such as precision farming, crop monitoring, pest detection, and environmental sustainability. While several Visual Question Answering (VQA) datasets and benchmarks have been developed to evaluate VLM performance, they often fail to adequately assess the critical reasoning and problem-solving skills required in complex agricultural contexts. To address this gap, we introduce AgriCoT, a VQA dataset that incorporates Chain-of-Thought (CoT) reasoning, specifically designed to evaluate the reasoning capabilities of VLMs. With 4,535 carefully curated samples, AgriCoT offers a comprehensive and robust evaluation of reasoning abilities for VLMs, particularly in zero-shot scenarios, by focusing on their capacity to engage in logical reasoning and effective problem-solving. Our evaluations, conducted with 26 representative VLMs, including both proprietary and open-source models, reveal that while some proprietary models excel at answering questions, there is a notable and significant gap in their reasoning capabilities. This underscores the importance of incorporating CoT for more precise and effective assessments. Our dataset are available at https://huggingface.co/datasets/wenyb/AgriCoT.
Problem

Research questions and friction points this paper is trying to address.

Evaluating reasoning capabilities of vision-language models in agriculture
Addressing gaps in assessing logical reasoning for agricultural problem-solving
Measuring reasoning performance in zero-shot scenarios for agricultural applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introducing AgriCoT dataset for VLM reasoning evaluation
Incorporating Chain-of-Thought reasoning in agricultural VQA
Evaluating 26 VLMs with zero-shot logical reasoning tests
🔎 Similar Papers
No similar papers found.
Y
Yibin Wen
Sun Yat-sen University
Qingmei Li
Qingmei Li
Tsinghua University
Remote SensingSpatial Analysis
Z
Zi Ye
Sun Yat-sen University
J
Jiarui Zhang
Sun Yat-sen University
J
Jing Wu
Sun Yat-sen University
Z
Zurong Mai
Sun Yat-sen University
S
Shuohong Lou
Sun Yat-sen University
Y
Yuhang Chen
Sun Yat-sen University
H
Henglian Huang
Sun Yat-sen University
X
Xiaoya Fan
Southwest University
Y
Yang Zhang
Sun Yat-sen University
L
Lingyuan Zhao
HuanTian Wisdom Technology Co., Ltd.
Haohuan Fu
Haohuan Fu
Tsinghua University
H
Huang Jianxi
China Agricultural University,Southwest Jiaotong University
J
Juepeng Zheng
Sun Yat-sen University,National Supercomputing Center in Shenzhen