Can Large Multimodal Models Understand Agricultural Scenes? Benchmarking with AgroMind

📅 2025-05-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The agricultural remote sensing (RS) community lacks a comprehensive, large-scale multimodal evaluation benchmark tailored for Large Multimodal Models (LMMs), suffering from narrow application scenarios, coarse-grained tasks, and insufficient coverage of cognitive dimensions. Method: We propose AgroMind—the first agricultural RS-specific multimodal benchmark—spanning four cognitive dimensions: spatial perception, object understanding, scene understanding, and reasoning, comprising 13 fine-grained tasks, 25,026 question-answer pairs, and 15,556 multi-source RS images. We systematically design a domain-specific, multi-granularity cognitive task taxonomy, develop an automated, agriculture-aware question-generation pipeline, and establish a unified evaluation framework covering 18 open-source and 3 closed-source LMMs. Results: Experiments reveal that current LMMs significantly underperform humans in spatial reasoning and fine-grained identification, yet surpass human accuracy in crop classification. AgroMind provides the first reproducible, extensible evaluation standard for multimodal models in agricultural RS.

Technology Category

Application Category

📝 Abstract
Large Multimodal Models (LMMs) has demonstrated capabilities across various domains, but comprehensive benchmarks for agricultural remote sensing (RS) remain scarce. Existing benchmarks designed for agricultural RS scenarios exhibit notable limitations, primarily in terms of insufficient scene diversity in the dataset and oversimplified task design. To bridge this gap, we introduce AgroMind, a comprehensive agricultural remote sensing benchmark covering four task dimensions: spatial perception, object understanding, scene understanding, and scene reasoning, with a total of 13 task types, ranging from crop identification and health monitoring to environmental analysis. We curate a high-quality evaluation set by integrating eight public datasets and one private farmland plot dataset, containing 25,026 QA pairs and 15,556 images. The pipeline begins with multi-source data preprocessing, including collection, format standardization, and annotation refinement. We then generate a diverse set of agriculturally relevant questions through the systematic definition of tasks. Finally, we employ LMMs for inference, generating responses, and performing detailed examinations. We evaluated 18 open-source LMMs and 3 closed-source models on AgroMind. Experiments reveal significant performance gaps, particularly in spatial reasoning and fine-grained recognition, it is notable that human performance lags behind several leading LMMs. By establishing a standardized evaluation framework for agricultural RS, AgroMind reveals the limitations of LMMs in domain knowledge and highlights critical challenges for future work. Data and code can be accessed at https://rssysu.github.io/AgroMind/.
Problem

Research questions and friction points this paper is trying to address.

Assessing LMMs' capability in agricultural remote sensing tasks
Addressing limitations in existing agricultural RS benchmarks
Evaluating LMMs' performance on diverse agricultural scene understanding
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introducing AgroMind benchmark for agricultural remote sensing
Integrating multi-source datasets with 25,026 QA pairs
Evaluating 21 LMMs on diverse agricultural tasks
🔎 Similar Papers
No similar papers found.
Qingmei Li
Qingmei Li
Tsinghua University
Remote SensingSpatial Analysis
Y
Yang Zhang
Sun Yat-Sen University
Z
Zurong Mai
Sun Yat-Sen University
Y
Yuhang Chen
Sun Yat-Sen University
S
Shuohong Lou
Sun Yat-Sen University
H
Henglian Huang
Sun Yat-Sen University
J
Jiarui Zhang
Sun Yat-Sen University
Z
Zhiwei Zhang
Sun Yat-Sen University
Y
Yibin Wen
Sun Yat-Sen University
W
Weijia Li
Sun Yat-Sen University
Haohuan Fu
Haohuan Fu
Tsinghua University
Jianxi Huang
Jianxi Huang
Professor in China Agricultural University
Data assimilationClimate changeAgricultural remote sensingCrop modeling with remote sensing data assimilationCrop yield
J
Juepeng Zheng
Sun Yat-Sen University, National Supercomputing Center in Shenzhen