Memorization vs. Reasoning: Updating LLMs with New Knowledge

📅 2025-04-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing large language models (LLMs) predominantly rely on entity replacement for knowledge updating, failing to handle complex, dynamic real-world knowledge evolution. To address this, we propose the Knowledge Updating Playground (KUP)—the first automated benchmark enabling joint evaluation of memory retention and logical reasoning. We further introduce Memory-Conditioned Training (MCT), a lightweight method that conditions model generation on dedicated memory tokens, explicitly guiding inference-time retrieval and logical integration of updated knowledge—thereby overcoming the limitation of conventional fine-tuning, which only reinforces surface-level memorization. KUP employs a multi-tiered evaluation protocol—including direct/indirect probing and evidence-driven continual pretraining—posing extreme difficulty (top-performing fine-tuned models achieve <2% reasoning accuracy). Empirically, MCT yields up to 25.4% absolute improvement on memory tasks, significantly outperforming state-of-the-art approaches.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) encode vast amounts of pre-trained knowledge in their parameters, but updating them as real-world information evolves remains a challenge. Existing methodologies and benchmarks primarily target entity substitutions, failing to capture the full breadth of complex real-world dynamics. In this paper, we introduce Knowledge Update Playground (KUP), an automatic pipeline for simulating realistic knowledge updates reflected in an evidence corpora. KUP's evaluation framework includes direct and indirect probes to both test memorization of updated facts and reasoning over them, for any update learning methods. Next, we present a lightweight method called memory conditioned training (MCT), which conditions tokens in the update corpus on self-generated"memory"tokens during training. Our strategy encourages LLMs to surface and reason over newly memorized knowledge at inference. Our results on two strong LLMs show that (1) KUP benchmark is highly challenging, with the best CPT models achieving $<2%$ in indirect probing setting (reasoning) and (2) MCT training significantly outperforms prior continued pre-training (CPT) baselines, improving direct probing (memorization) results by up to $25.4%$.
Problem

Research questions and friction points this paper is trying to address.

Updating LLMs with evolving real-world knowledge effectively
Evaluating both memorization and reasoning in knowledge updates
Improving update methods with memory-conditioned training (MCT)
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces Knowledge Update Playground (KUP) benchmark
Proposes memory conditioned training (MCT) method
Enhances memorization and reasoning in LLMs