🤖 AI Summary
Balancing readability and scientific accuracy remains challenging in climate adaptation question-answering (QA) systems, particularly for agricultural advisors. Method: We propose a trustworthy QA framework tailored to agricultural advisors, featuring a structured ScholarGuide prompting mechanism that integrates large language models (LLMs), cross-model consistency analysis, and domain expertise. A consistency-weighted hybrid evaluator enables domain-anchored, verifiable QA generation without fine-tuning or reinforcement learning. The framework jointly processes unstructured literature and structured climate data. Contribution/Results: On an expert-annotated dataset, our method significantly outperforms baselines across most metrics. Ablation studies confirm the efficacy of each component. LLM-based automatic evaluation shows strong correlation with human judgments (Spearman’s ρ > 0.85), validating the framework’s reliability and practical utility for climate-adaptation QA.
📝 Abstract
Climate adaptation strategies are proposed in response to climate change. They are practised in agriculture to sustain food production. These strategies can be found in unstructured data (for example, scientific literature from the Elsevier website) or structured (heterogeneous climate data via government APIs). We present Climate Adaptation question-answering with Improved Readability and Noted Sources (CAIRNS), a framework that enables experts -- farmer advisors -- to obtain credible preliminary answers from complex evidence sources from the web. It enhances readability and citation reliability through a structured ScholarGuide prompt and achieves robust evaluation via a consistency-weighted hybrid evaluator that leverages inter-model agreement with experts. Together, these components enable readable, verifiable, and domain-grounded question-answering without fine-tuning or reinforcement learning. Using a previously reported dataset of expert-curated question-answers, we show that CAIRNS outperforms the baselines on most of the metrics. Our thorough ablation study confirms the results on all metrics. To validate our LLM-based evaluation, we also report an analysis of correlations against human judgment.