The Future of Facts: Tracing the Factual Generation-Verification Gap

📅 2026-05-26

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

This study addresses the significant gap between language models’ generative and veridical capabilities—termed the GV-gap—with respect to factual knowledge, whose dynamic evolution remains poorly understood. Distinguishing factual GV-gap from computational and aesthetic discrepancies, this work conducts a longitudinal analysis across four open-source model families at two scales, systematically tracing the co-evolution of generation and verification abilities through three phases: knowledge acquisition, continual learning, and knowledge updating. The findings reveal that verification consistently precedes and surpasses generation in performance; after knowledge updates, models often concurrently accept both old and new facts, resulting in a “multiverse of facts”; and even for high-frequency facts, persistent verification biases remain. These results uncover universal patterns underlying the GV-gap, offering theoretical foundations for improving factual consistency in language models.

📝 Abstract

Language models are becoming the default interface to factual knowledge, yet they often verify outputs more reliably than they generate them. This generation-verification gap (GV-gap) underlies many recent advances in self-improvement and reasoning, but its dynamics on factual knowledge specifically remain poorly understood. We focus on the training mechanisms underlying factual GV-gaps, distinguishing them from their computational and aesthetic counterparts. We trace generation and verification capabilities through three training phases (acquisition, continual learning, and updating) across four open-source model families at two scales each. Three findings recur across models: (i) verification is consistently learned before generation; (ii) verification is more robust to continual learning than generation; and (iii) factual updates can leave models in a "multi-verse" state, simultaneously verifying both old and new answers as correct. Natural experiments on frontier models reproduce these dynamics at scale and reveal residual verification biases on well-covered facts.

Problem

Research questions and friction points this paper is trying to address.

generation-verification gap

factual knowledge

language models

training dynamics

model updating

Innovation

Methods, ideas, or system contributions that make the work stand out.

generation-verification gap

factual knowledge

continual learning