Inside-Out: Hidden Factual Knowledge in LLMs

📅 2025-03-19

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

This work investigates whether large language models (LLMs) internally encode significantly more factual knowledge than they explicitly generate in their outputs. To address this, we formally define “hidden knowledge” and propose a unified quantification framework for internal (latent) and external (output) knowledge. Under a closed-book question-answering setting, we jointly analyze token-level output probabilities (external knowledge) and intermediate-layer activation representations (internal knowledge), employing Monte Carlo sampling over 1,000 trials to assess answer discriminability and generability. Experiments across three open-source LLMs reveal that internal knowledge exceeds output knowledge by an average of 40%. Notably, we observe cases where the correct answer achieves 100% discriminability despite zero-generation probability—demonstrating perfect encoding without generability. These findings expose a systematic gap between LLMs’ representational capacity and their sequence generation mechanisms, underscoring the necessity of decoupling representation learning from autoregressive decoding in model design and evaluation.

Technology Category

Application Category

📝 Abstract

This work presents a framework for assessing whether large language models (LLMs) encode more factual knowledge in their parameters than what they express in their outputs. While a few studies hint at this possibility, none has clearly defined or demonstrated this phenomenon. We first propose a formal definition of knowledge, quantifying it for a given question as the fraction of correct-incorrect answer pairs where the correct one is ranked higher. This gives rise to external and internal knowledge, depending on the information used to score individual answer candidates: either the model's observable token-level probabilities or its intermediate computations. Hidden knowledge arises when internal knowledge exceeds external knowledge. We then present a case study, applying this framework to three popular open-weights LLMs in a closed-book QA setup. Our results indicate that: (1) LLMs consistently encode more factual knowledge internally than what they express externally, with an average gap of 40%. (2) Surprisingly, some knowledge is so deeply hidden that a model can internally know an answer perfectly, yet fail to generate it even once, despite large-scale repeated sampling of 1,000 answers. This reveals fundamental limitations in the generation capabilities of LLMs, which (3) puts a practical constraint on scaling test-time compute via repeated answer sampling in closed-book QA: significant performance improvements remain inaccessible because some answers are practically never sampled, yet if they were, we would be guaranteed to rank them first.

Problem

Research questions and friction points this paper is trying to address.

Assessing hidden factual knowledge in LLMs

Quantifying internal vs external knowledge in models

Exploring limitations in LLM generation capabilities

Innovation

Methods, ideas, or system contributions that make the work stand out.

Defines internal vs external knowledge in LLMs

Quantifies hidden knowledge using answer ranking

Reveals LLMs' generation limitations via repeated sampling

🔎 Similar Papers

FacLens: Transferable Probe for Foreseeing Non-Factuality in Large Language Models