From UAV Imagery to Agronomic Reasoning: A Multimodal LLM Benchmark for Plant Phenotyping

๐Ÿ“… 2026-04-10
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

190K/year
๐Ÿค– AI Summary
This work addresses the challenge that general-purpose multimodal models struggle to integrate domain-specific agronomic knowledge with fine-grained visual reasoning required for plant phenotyping. To bridge this gap, the authors propose PlantXpertโ€”the first structured and reproducible multimodal benchmark tailored for soybean and cotton, comprising 385 drone-captured images and over 3,000 annotated samples spanning critical tasks such as disease, pest, weed detection, and yield estimation. Evaluating 11 state-of-the-art vision-language models (e.g., Qwen3-VL-4B/30B) with task-specific fine-tuning and multi-step reasoning protocols, the study achieves a post-fine-tuning accuracy of 78%. However, it reveals diminishing returns from model scaling and limited cross-crop generalization, highlighting that robust quantitative and biologically plausible reasoning remains a fundamental challenge in agricultural AI.

Technology Category

Application Category

๐Ÿ“ Abstract
To improve crop genetics, high-throughput, effective and comprehensive phenotyping is a critical prerequisite. While such tasks were traditionally performed manually, recent advances in multimodal foundation models, especially in vision-language models (VLMs), have enabled more automated and robust phenotypic analysis. However, plant science remains a particularly challenging domain for foundation models because it requires domain-specific knowledge, fine-grained visual interpretation, and complex biological and agronomic reasoning. To address this gap, we develop PlantXpert, an evidence-grounded multimodal reasoning benchmark for soybean and cotton phenotyping. Our benchmark provides a structured and reproducible framework for agronomic adaptation of VLMs, and enables controlled comparison between base models and their domain-adapted counterparts. We constructed a dataset comprising 385 digital images and more than 3,000 benchmark samples spanning key plant science domains including disease, pest control, weed management, and yield. The benchmark can assess diverse capabilities including visual expertise, quantitative reasoning, and multi-step agronomic reasoning. A total of 11 state-of-the-art VLMs were evaluated. The results indicate that task-specific fine-tuning leads to substantial improvement in accuracy, with models such as Qwen3-VL-4B and Qwen3-VL-30B achieving up to 78%. At the same time, gains from model scaling diminish beyond a certain capacity, generalization across soybean and cotton remains uneven, and quantitative as well as biologically grounded reasoning continue to pose substantial challenges. These findings suggest that PlantXpert can serve as a foundation for assessing evidence-grounded agronomic reasoning and for advancing multimodal model development in plant science.
Problem

Research questions and friction points this paper is trying to address.

plant phenotyping
agronomic reasoning
multimodal LLM
vision-language models
UAV imagery
Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal reasoning
vision-language models
plant phenotyping
agronomic reasoning
domain adaptation
๐Ÿ”Ž Similar Papers
No similar papers found.
Yu Wu
Yu Wu
University of Cambridge
machine learninghealth sensingmobile health
G
Guangzeng Han
Computer Science, University of Memphis, Memphis, 38111, TN, United States
I
Ibra Niang Niang
Computer Science, University of Memphis, Memphis, 38111, TN, United States
F
Francia Ravelombola
Fisher Delta Research Extension and Education Center, University of Missouri, Portageville, 63873, MO, United States
M
Maiara Oliveira
Fisher Delta Research Extension and Education Center, University of Missouri, Portageville, 63873, MO, United States
Jason Davis
Jason Davis
Associate Professor of Entrepreneurship, INSEAD
InnovationStrategyOrganization TheoryNetworksCollaboration
Dong Chen
Dong Chen
Assistant Professor, Mississippi State University
Reinforcement LearningRoboticsSmart Agriculture
F
Feng Lin
Fisher Delta Research Extension and Education Center, University of Missouri, Portageville, 63873, MO, United States
Xiaolei Huang
Xiaolei Huang
University of Memphis
Machine LearningNatural Language ProcessingHealth InformaticsLLM for Sciences