🤖 AI Summary
This work addresses the lack of verifiable rationality assessment for causal explanations generated by large language models (LLMs). To this end, we propose IBE-Eval—the first framework that formalizes the philosophical principle of Inference to the Best Explanation (IBE) into a computationally tractable evaluation paradigm. IBE-Eval jointly leverages logical rule modeling and linguistic feature analysis to unsupervisedly quantify four key explanatory qualities: consistency, simplicity, coherence, and uncertainty. On causal question-answering benchmarks, it achieves 77% accuracy—outperforming random baseline by +27% and GPT-3.5-as-a-Judge baseline by +17%—and exhibits strong agreement with human judgments (Spearman’s ρ > 0.85). Its core contributions are threefold: (1) the first computationally grounded instantiation of IBE for LLM explanation evaluation; (2) high discriminative power, intrinsic interpretability, and cross-model robustness; and (3) a novel, principled paradigm for assessing the trustworthiness of LLM-generated causal explanations.
📝 Abstract
While Large Language Models (LLMs) have found success in real-world applications, their underlying explanatory process is still poorly understood. This paper proposes IBE-Eval, a framework inspired by philosophical accounts on Inference to the Best Explanation (IBE) to advance the interpretation and evaluation of LLMs' explanations. IBE-Eval estimates the plausibility of natural language explanations through a combination of explicit logical and linguistic features including: consistency, parsimony, coherence, and uncertainty. Extensive experiments are conducted on Causal Question Answering (CQA), where extit{IBE-Eval} is tasked to select the most plausible causal explanation amongst competing ones generated by LLMs (i.e., GPT 3.5 and Llama 2). The experiments reveal that IBE-Eval can successfully identify the best explanation with up to 77% accuracy ($approx 27%$ above random), improving upon a GPT 3.5-as-a-Judge baseline ($approx+17%$) while being intrinsically more efficient and interpretable. Additional analyses suggest that, despite model-specific variances, LLM-generated explanations tend to conform to IBE criteria and that IBE-Eval is significantly correlated with human judgment, opening up opportunities for future development of automated explanation verification tools.