GRADE: Probing Knowledge Gaps in LLMs through Gradient Subspace Dynamics

πŸ“… 2026-04-03
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge of internal knowledge insufficiency in large language models when answering specific questions by proposing a novel subspace analysis approach. It introduces, for the first time, the cross-layer rank ratio dynamics between hidden-state subspaces and gradient subspaces to quantify the gap between the knowledge required for a given query and the knowledge actually activated by the model, thereby enabling detection of knowledge gaps. Integrating gradient dynamics, representational analysis, knowledge activation assessment, and input perturbation validation, the method demonstrates strong effectiveness and robustness across six benchmarks. Furthermore, it provides interpretable analyses of knowledge deficiencies within long-form model responses, offering valuable support for responsible deployment of language models.
πŸ“ Abstract
Detecting whether a model's internal knowledge is sufficient to correctly answer a given question is a fundamental challenge in deploying responsible LLMs. In addition to verbalising the confidence by LLM self-report, more recent methods explore the model internals, such as the hidden states of the response tokens to capture how much knowledge is activated. We argue that such activated knowledge may not align with what the query requires, e.g., capturing the stylistic and length-related features that are uninformative for answering the query. To fill the gap, we propose GRADE (Gradient Dynamics for knowledge gap detection), which quantifies the knowledge gap via the cross-layer rank ratio of the gradient to that of the corresponding hidden state subspace. This is motivated by the property of gradients as estimators of the required knowledge updates for a given target. We validate \modelname{} on six benchmarks, demonstrating its effectiveness and robustness to input perturbations. In addition, we present a case study showing how the gradient chain can generate interpretable explanations of knowledge gaps for long-form answers.
Problem

Research questions and friction points this paper is trying to address.

knowledge gap
large language models
gradient dynamics
model reliability
internal knowledge
Innovation

Methods, ideas, or system contributions that make the work stand out.

knowledge gap detection
gradient subspace
hidden state dynamics
large language models
interpretability
πŸ”Ž Similar Papers
No similar papers found.