🤖 AI Summary
Existing XAI explanations often lack accessibility for non-experts, hindering trust establishment. Method: This paper proposes the first iterative refinement framework integrating SHAP-based interpretability with a multimodal large language model (LLM) autonomous agent, instantiated in rice yield recommendation. Adopting an agentic AI paradigm, it conducts 11 rounds of human-AI co-refinement to generate progressive explanations and establishes a seven-dimensional hybrid evaluation framework combining human judgment and LLM assessment. Contribution/Results: The study uncovers a non-monotonic evolution pattern in explanation quality—peaking at Rounds 3–4 (average improvement of 30–33%) before significantly declining with excessive iteration. Based on this, we propose an early-stopping regularization principle, challenging the conventional “more iterations are better” assumption and providing critical methodological grounding for practical XAI deployment.
📝 Abstract
Explainable artificial intelligence (XAI) enables data-driven understanding of factor associations with response variables, yet communicating XAI outputs to laypersons remains challenging, hindering trust in AI-based predictions. Large language models (LLMs) have emerged as promising tools for translating technical explanations into accessible narratives, yet the integration of agentic AI, where LLMs operate as autonomous agents through iterative refinement, with XAI remains unexplored. This study proposes an agentic XAI framework combining SHAP-based explainability with multimodal LLM-driven iterative refinement to generate progressively enhanced explanations. As a use case, we tested this framework as an agricultural recommendation system using rice yield data from 26 fields in Japan. The Agentic XAI initially provided a SHAP result and explored how to improve the explanation through additional analysis iteratively across 11 refinement rounds (Rounds 0-10). Explanations were evaluated by human experts (crop scientists) (n=12) and LLMs (n=14) against seven metrics: Specificity, Clarity, Conciseness, Practicality, Contextual Relevance, Cost Consideration, and Crop Science Credibility. Both evaluator groups confirmed that the framework successfully enhanced recommendation quality with an average score increase of 30-33% from Round 0, peaking at Rounds 3-4. However, excessive refinement showed a substantial drop in recommendation quality, indicating a bias-variance trade-off where early rounds lacked explanation depth (bias) while excessive iteration introduced verbosity and ungrounded abstraction (variance), as revealed by metric-specific analysis. These findings suggest that strategic early stopping (regularization) is needed for optimizing practical utility, challenging assumptions about monotonic improvement and providing evidence-based design principles for agentic XAI systems.