🤖 AI Summary
LLMs suffer from catastrophic forgetting during continual knowledge updating and struggle to identify and resolve conflicts between new and existing knowledge—unlike humans. This paper introduces cognitive dissonance theory into LLM knowledge updating for the first time, proposing a conflict-aware update paradigm grounded in human cognitive mechanisms. Specifically, it detects information inconsistency via activation and gradient features, tracks neuron activation dynamics to distinguish “rigid” from “plastic” parameters, and applies targeted parameter updates accordingly. Experiments show that inconsistency can be efficiently detected with minimal overhead; non-inconsistent updates preserve prior knowledge almost perfectly, whereas inconsistent updates induce global degradation of unrelated knowledge—revealing a fundamental structural fragility in current LLM knowledge representations. This work establishes a novel paradigm and empirical foundation for building robust, evolvable language model knowledge architectures.
📝 Abstract
Despite remarkable capabilities, large language models (LLMs) struggle to continually update their knowledge without catastrophic forgetting. In contrast, humans effortlessly integrate new information, detect conflicts with existing beliefs, and selectively update their mental models. This paper introduces a cognitive-inspired investigation paradigm to study continual knowledge updating in LLMs. We implement two key components inspired by human cognition: (1) Dissonance and Familiarity Awareness, analyzing model behavior to classify information as novel, familiar, or dissonant; and (2) Targeted Network Updates, which track neural activity to identify frequently used (stubborn) and rarely used (plastic) neurons. Through carefully designed experiments in controlled settings, we uncover a number of empirical findings demonstrating the potential of this approach. First, dissonance detection is feasible using simple activation and gradient features, suggesting potential for cognitive-inspired training. Second, we find that non-dissonant updates largely preserve prior knowledge regardless of targeting strategy, revealing inherent robustness in LLM knowledge integration. Most critically, we discover that dissonant updates prove catastrophically destructive to the model's knowledge base, indiscriminately affecting even information unrelated to the current updates. This suggests fundamental limitations in how neural networks handle contradictions and motivates the need for new approaches to knowledge updating that better mirror human cognitive mechanisms.