Inference to the Best Explanation in Large Language Models

📅 2024-02-16

🏛️ Annual Meeting of the Association for Computational Linguistics

📈 Citations: 1

✨ Influential: 0

career value

180K/year

🤖 AI Summary

This work addresses the lack of verifiable rationality assessment for causal explanations generated by large language models (LLMs). To this end, we propose IBE-Eval—the first framework that formalizes the philosophical principle of Inference to the Best Explanation (IBE) into a computationally tractable evaluation paradigm. IBE-Eval jointly leverages logical rule modeling and linguistic feature analysis to unsupervisedly quantify four key explanatory qualities: consistency, simplicity, coherence, and uncertainty. On causal question-answering benchmarks, it achieves 77% accuracy—outperforming random baseline by +27% and GPT-3.5-as-a-Judge baseline by +17%—and exhibits strong agreement with human judgments (Spearman’s ρ > 0.85). Its core contributions are threefold: (1) the first computationally grounded instantiation of IBE for LLM explanation evaluation; (2) high discriminative power, intrinsic interpretability, and cross-model robustness; and (3) a novel, principled paradigm for assessing the trustworthiness of LLM-generated causal explanations.

Technology Category

Application Category

📝 Abstract

While Large Language Models (LLMs) have found success in real-world applications, their underlying explanatory process is still poorly understood. This paper proposes IBE-Eval, a framework inspired by philosophical accounts on Inference to the Best Explanation (IBE) to advance the interpretation and evaluation of LLMs' explanations. IBE-Eval estimates the plausibility of natural language explanations through a combination of explicit logical and linguistic features including: consistency, parsimony, coherence, and uncertainty. Extensive experiments are conducted on Causal Question Answering (CQA), where extit{IBE-Eval} is tasked to select the most plausible causal explanation amongst competing ones generated by LLMs (i.e., GPT 3.5 and Llama 2). The experiments reveal that IBE-Eval can successfully identify the best explanation with up to 77% accuracy ($approx 27%$ above random), improving upon a GPT 3.5-as-a-Judge baseline ($approx+17%$) while being intrinsically more efficient and interpretable. Additional analyses suggest that, despite model-specific variances, LLM-generated explanations tend to conform to IBE criteria and that IBE-Eval is significantly correlated with human judgment, opening up opportunities for future development of automated explanation verification tools.

Problem

Research questions and friction points this paper is trying to address.

Understanding explanatory processes in Large Language Models (LLMs).

Evaluating plausibility of natural language explanations using IBE-Eval.

Improving accuracy in selecting best causal explanations from LLMs.

Innovation

Methods, ideas, or system contributions that make the work stand out.

IBE-Eval framework evaluates LLM explanations

Combines logical and linguistic features for plausibility

Improves accuracy in selecting best causal explanations

🔎 Similar Papers

FaithLM: Towards Faithful Explanations for Large Language Models