Rote Learning Considered Useful: Generalizing over Memorized Data in LLMs

📅 2025-07-29

📈 Citations: 0

✨ Influential: 0

career value

143K/year

🤖 AI Summary

This work challenges the conventional view that rote memorization impedes generalization, investigating whether large language models (LLMs) can achieve semantic generalization from mechanical memorization. Method: We propose a “memorize-then-generalize” two-stage framework: (1) a memorization stage where factual triplets are encoded via meaningless masked tokens without semantic grounding; and (2) a generalization stage where minimal semantic prompts—without additional training data—trigger reinterpretation of the memorized structures. Contribution/Results: We provide the first empirical evidence that eight mainstream LLMs form structured, semantically aligned representations in latent space, enabling efficient knowledge injection and cross-prompt generalization. Our findings reveal a novel mechanistic link between memory and understanding in LLMs and raise critical awareness of potential adversarial reuse of mechanical memorization.

Technology Category

Application Category

📝 Abstract

Rote learning is a memorization technique based on repetition. It is commonly believed to hinder generalization by encouraging verbatim memorization rather than deeper understanding. This insight holds for even learning factual knowledge that inevitably requires a certain degree of memorization. In this work, we demonstrate that LLMs can be trained to generalize from rote memorized data. We introduce a two-phase memorize-then-generalize framework, where the model first rote memorizes factual subject-object associations using a semantically meaningless token and then learns to generalize by fine-tuning on a small set of semantically meaningful prompts. Extensive experiments over 8 LLMs show that the models can reinterpret rote memorized data through the semantically meaningful prompts, as evidenced by the emergence of structured, semantically aligned latent representations between the two. This surprising finding opens the door to both effective and efficient knowledge injection and possible risks of repurposing the memorized data for malicious usage.

Problem

Research questions and friction points this paper is trying to address.

LLMs generalize from rote memorized data

Memorize-then-generalize framework for knowledge injection

Risks of repurposing memorized data maliciously

Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-phase memorize-then-generalize framework

Semantically meaningless token for memorization

Fine-tuning with meaningful prompts for generalization

🔎 Similar Papers

Generalization v.s. Memorization: Tracing Language Models' Capabilities Back to Pretraining Data