Inference to the Best Explanation in Large Language Models

📅 2024-02-16
🏛️ Annual Meeting of the Association for Computational Linguistics
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of verifiable rationality assessment for causal explanations generated by large language models (LLMs). To this end, we propose IBE-Eval—the first framework that formalizes the philosophical principle of Inference to the Best Explanation (IBE) into a computationally tractable evaluation paradigm. IBE-Eval jointly leverages logical rule modeling and linguistic feature analysis to unsupervisedly quantify four key explanatory qualities: consistency, simplicity, coherence, and uncertainty. On causal question-answering benchmarks, it achieves 77% accuracy—outperforming random baseline by +27% and GPT-3.5-as-a-Judge baseline by +17%—and exhibits strong agreement with human judgments (Spearman’s ρ > 0.85). Its core contributions are threefold: (1) the first computationally grounded instantiation of IBE for LLM explanation evaluation; (2) high discriminative power, intrinsic interpretability, and cross-model robustness; and (3) a novel, principled paradigm for assessing the trustworthiness of LLM-generated causal explanations.

Technology Category

Application Category

📝 Abstract
While Large Language Models (LLMs) have found success in real-world applications, their underlying explanatory process is still poorly understood. This paper proposes IBE-Eval, a framework inspired by philosophical accounts on Inference to the Best Explanation (IBE) to advance the interpretation and evaluation of LLMs' explanations. IBE-Eval estimates the plausibility of natural language explanations through a combination of explicit logical and linguistic features including: consistency, parsimony, coherence, and uncertainty. Extensive experiments are conducted on Causal Question Answering (CQA), where extit{IBE-Eval} is tasked to select the most plausible causal explanation amongst competing ones generated by LLMs (i.e., GPT 3.5 and Llama 2). The experiments reveal that IBE-Eval can successfully identify the best explanation with up to 77% accuracy ($approx 27%$ above random), improving upon a GPT 3.5-as-a-Judge baseline ($approx+17%$) while being intrinsically more efficient and interpretable. Additional analyses suggest that, despite model-specific variances, LLM-generated explanations tend to conform to IBE criteria and that IBE-Eval is significantly correlated with human judgment, opening up opportunities for future development of automated explanation verification tools.
Problem

Research questions and friction points this paper is trying to address.

Understanding explanatory processes in Large Language Models (LLMs).
Evaluating plausibility of natural language explanations using IBE-Eval.
Improving accuracy in selecting best causal explanations from LLMs.
Innovation

Methods, ideas, or system contributions that make the work stand out.

IBE-Eval framework evaluates LLM explanations
Combines logical and linguistic features for plausibility
Improves accuracy in selecting best causal explanations
D
Dhairya Dalal
SFI Centre for Research and Training in Artificial Intelligence, University of Galway, Ireland
Marco Valentino
Marco Valentino
University of Sheffield
Natural Language ProcessingNeurosymbolic AIExplanation
A
André Freitas
Department of Computer Science, University of Manchester, UK; National Biomarker Centre, CRUK-MI, University of Manchester, UK; Idiap Research Institute, Switzerland
Paul Buitelaar
Paul Buitelaar
Professor in Data Analytics, Data Science Institute, Univ of Galway, Co-PI Insight Centre
Natural Language ProcessingKnowledge GraphsText MiningSemantics