Grokking Finite-Dimensional Algebra

📅 2026-02-23
📈 Citations: 0
✹ Influential: 0
📄 PDF
đŸ€– AI Summary
This work investigates the phenomenon of grokking in neural networks when learning bilinear multiplication operations of finite-dimensional algebras, including non-associative, non-commutative, and non-unital cases. By representing algebraic products via structure tensors and integrating real-valued matrix factorization with discrete representation learning over finite fields, the study dynamically tracks the alignment between embedding spaces and intrinsic algebraic representations. It extends the unified framework of grokking for the first time to general finite-dimensional algebras, revealing how algebraic properties—such as associativity and commutativity—and structural characteristics of the associated tensors—including rank and sparsity—critically influence the timing and capacity of generalization. This provides a rigorous mathematical perspective on the mechanisms underlying neural network generalization.

Technology Category

Application Category

📝 Abstract
This paper investigates the grokking phenomenon, which refers to the sudden transition from a long memorization to generalization observed during neural networks training, in the context of learning multiplication in finite-dimensional algebras (FDA). While prior work on grokking has focused mainly on group operations, we extend the analysis to more general algebraic structures, including non-associative, non-commutative, and non-unital algebras. We show that learning group operations is a special case of learning FDA, and that learning multiplication in FDA amounts to learning a bilinear product specified by the algebra's structure tensor. For algebras over the reals, we connect the learning problem to matrix factorization with an implicit low-rank bias, and for algebras over finite fields, we show that grokking emerges naturally as models must learn discrete representations of algebraic elements. This leads us to experimentally investigate the following core questions: (i) how do algebraic properties such as commutativity, associativity, and unitality influence both the emergence and timing of grokking, (ii) how structural properties of the structure tensor of the FDA, such as sparsity and rank, influence generalization, and (iii) to what extent generalization correlates with the model learning latent embeddings aligned with the algebra's representation. Our work provides a unified framework for grokking across algebraic structures and new insights into how mathematical structure governs neural network generalization dynamics.
Problem

Research questions and friction points this paper is trying to address.

grokking
finite-dimensional algebras
generalization
structure tensor
neural network dynamics
Innovation

Methods, ideas, or system contributions that make the work stand out.

grokking
finite-dimensional algebras
structure tensor
bilinear product
generalization dynamics
🔎 Similar Papers
No similar papers found.
P
Pascal Jr Tikeng Notsawo
Université de Montréal, Montréal, Quebec, Canada; Mila, Quebec AI Institute, Montréal, Quebec, Canada; CHU Sainte-Justine Research Center, Montréal, Quebec, Canada
Guillaume Dumas
Guillaume Dumas
Associate Professor, CHUSJ/Mila, University of Montreal
HyperscanningNeurodynamicsPrecision PsychiatrySocial AISciML
Guillaume Rabusseau
Guillaume Rabusseau
Assistant Professor - Canada CIFAR AI Chair, Université de Montréal / Mila
Machine LearningTensorsWeighted AutomataTensor Networks