The Rules-and-Facts Model for Simultaneous Generalization and Memorization in Neural Networks

📅 2026-03-26

📈 Citations: 0

✨ Influential: 0

career value

236K/year

🤖 AI Summary

This work addresses the lack of a unified theoretical framework explaining how neural networks simultaneously learn underlying rules and memorize specific facts or exceptions. The authors propose an analytically tractable Rules-and-Facts (RAF) model that decomposes labels into a structured rule-generated component and a stochastic fact component. By integrating a teacher-student framework with Gardner capacity analysis, they establish a unified theory characterizing both generalization and memorization capabilities. The study elucidates how over-parameterization, regularization, and the choice of kernel or nonlinearity govern the allocation of capacity between rule learning and fact memorization. Furthermore, it quantifies the precise conditions under which rule recovery and fact memorization can coexist, offering a theoretical foundation for understanding how modern neural networks process structured and unstructured information.

Technology Category

Application Category

📝 Abstract

A key capability of modern neural networks is their capacity to simultaneously learn underlying rules and memorize specific facts or exceptions. Yet, theoretical understanding of this dual capability remains limited. We introduce the Rules-and-Facts (RAF) model, a minimal solvable setting that enables precise characterization of this phenomenon by bridging two classical lines of work in the statistical physics of learning: the teacher-student framework for generalization and Gardner-style capacity analysis for memorization. In the RAF model, a fraction $1 - \varepsilon$ of training labels is generated by a structured teacher rule, while a fraction $\varepsilon$ consists of unstructured facts with random labels. We characterize when the learner can simultaneously recover the underlying rule - allowing generalization to new data - and memorize the unstructured examples. Our results quantify how overparameterization enables the simultaneous realization of these two objectives: sufficient excess capacity supports memorization, while regularization and the choice of kernel or nonlinearity control the allocation of capacity between rule learning and memorization. The RAF model provides a theoretical foundation for understanding how modern neural networks can infer structure while storing rare or non-compressible information.

Problem

Research questions and friction points this paper is trying to address.

generalization

memorization

neural networks

overparameterization

statistical physics of learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Rules-and-Facts model

simultaneous generalization and memorization

overparameterization