Agri-CPJ: A Training-Free Explainable Framework for Agricultural Pest Diagnosis Using Caption-Prompt-Judge and LLM-as-a-Judge

📅 2026-04-26

📈 Citations: 0

✨ Influential: 0

career value

162K/year

🤖 AI Summary

This study addresses the limitations of agricultural pest and disease image diagnosis, which often suffers from erroneous species predictions and poor model interpretability. The authors propose Agri-CPJ, a framework that leverages large vision-language models to generate and iteratively refine structured morphological descriptions without requiring additional training. By integrating multi-perspective responses and an LLM-as-a-Judge mechanism for answer adjudication, Agri-CPJ constructs a traceable diagnostic audit trail. The approach substantially enhances both accuracy and trustworthiness: it improves disease classification accuracy by 22.7 percentage points and boosts question-answering scores by 19.5 points on CDDMBench. On AgMMU-MCQs, GPT-5-Nano achieves 77.84%, outperforming most open-source models of comparable scale.

Technology Category

Application Category

📝 Abstract

Crop disease diagnosis from field photographs faces two recurring problems: models that score well on benchmarks frequently hallucinate species names, and when predictions are correct, the reasoning behind them is typically inaccessible to the practitioner. This paper describes Agri-CPJ (Caption-Prompt-Judge), a training-free few-shot framework in which a large vision-language model first generates a structured morphological caption, iteratively refined through multi-dimensional quality gating, before any diagnostic question is answered. Two candidate responses are then generated from complementary viewpoints, and an LLM judge selects the stronger one based on domain-specific criteria. Caption refinement is the component with the largest individual impact: ablations confirm that skipping it consistently degrades downstream accuracy across both models tested. On CDDMBench, pairing GPT-5-Nano with GPT-5-mini-generated captions yields \textbf{+22.7} pp in disease classification and \textbf{+19.5} points in QA score over no-caption baselines. Evaluated without modification on AgMMU-MCQs, GPT-5-Nano reached 77.84\% and Qwen-VL-Chat reached 64.54\%, placing them at or above most open-source models of comparable scale despite the format shift from open-ended to multiple-choice. The structured caption and judge rationale together constitute a readable audit trail: a practitioner who disagrees with a diagnosis can identify the specific caption observation that was incorrect. Code and data are publicly available https://github.com/CPJ-Agricultural/CPJ-Agricultural-Diagnosis

Problem

Research questions and friction points this paper is trying to address.

agricultural pest diagnosis

hallucination

explainability

vision-language models

model interpretability

Innovation

Methods, ideas, or system contributions that make the work stand out.

training-free framework

structured morphological caption

LLM-as-a-Judge