🤖 AI Summary
This study investigates the neural functional specialization between memory storage and generalization reasoning in large language models (LLMs). To this end, we train small-scale LLMs on a controllable synthetic dataset and, for the first time, identify spatially segregated neuronal populations supporting memory versus generalization at the single-neuron level. Leveraging representational separability modeling and fine-grained neuron activation analysis, we achieve high-accuracy behavioral classification (>92%). Furthermore, we propose a gradient-guided, inference-time targeted intervention technique that significantly enhances target behavior selection (+38%) without parameter modification. Our core contribution is the empirical demonstration—grounded in causal neuroscientific evidence—that LLMs exhibit detectable, predictable, and intervenable functional neural differentiation. This work establishes the first mechanistic framework for probing and modulating cognitive behaviors in LLMs at the neural circuit level, bridging interpretability research with causal cognitive neuroscience.
📝 Abstract
In this paper, we explore the foundational mechanisms of memorization and generalization in Large Language Models (LLMs), inspired by the functional specialization observed in the human brain. Our investigation serves as a case study leveraging specially designed datasets and experimental-scale LLMs to lay the groundwork for understanding these behaviors. Specifically, we aim to first enable LLMs to exhibit both memorization and generalization by training with the designed dataset, then (a) examine whether LLMs exhibit neuron-level spatial differentiation for memorization and generalization, (b) predict these behaviors using model internal representations, and (c) steer the behaviors through inference-time interventions. Our findings reveal that neuron-wise differentiation of memorization and generalization is observable in LLMs, and targeted interventions can successfully direct their behavior.