🤖 AI Summary
Large language models (LLMs) frequently generate factually incorrect outputs (hallucinations), and existing refusal methods predominantly rely on post-hoc signals, failing to proactively identify unreliable responses prior to generation. Method: We propose ABCA, a causality-based, aspect-level proactive refusal framework that pioneers the application of causal inference to internal LLM representation analysis. ABCA decomposes knowledge and models responses along multiple semantic aspects to quantify cross-aspect knowledge diversity, enabling pre-generation reliability assessment. It explicitly distinguishes between “knowledge conflict” and “knowledge insufficiency” as two distinct refusal triggers, enhancing decision interpretability and logical transparency. Contribution/Results: Evaluated on standard benchmarks, ABCA achieves state-of-the-art refusal accuracy, significantly outperforming prior approaches. Moreover, it substantially improves the model’s ability to detect and refuse responses to out-of-distribution or previously unseen queries.
📝 Abstract
Large Language Models (LLMs) often produce fluent but factually incorrect responses, a phenomenon known as hallucination. Abstention, where the model chooses not to answer and instead outputs phrases such as "I don't know", is a common safeguard. However, existing abstention methods typically rely on post-generation signals, such as generation variations or feedback, which limits their ability to prevent unreliable responses in advance. In this paper, we introduce Aspect-Based Causal Abstention (ABCA), a new framework that enables early abstention by analysing the internal diversity of LLM knowledge through causal inference. This diversity reflects the multifaceted nature of parametric knowledge acquired from various sources, representing diverse aspects such as disciplines, legal contexts, or temporal frames. ABCA estimates causal effects conditioned on these aspects to assess the reliability of knowledge relevant to a given query. Based on these estimates, we enable two types of abstention: Type-1, where aspect effects are inconsistent (knowledge conflict), and Type-2, where aspect effects consistently support abstention (knowledge insufficiency). Experiments on standard benchmarks demonstrate that ABCA improves abstention reliability, achieves state-of-the-art performance, and enhances the interpretability of abstention decisions.