🤖 AI Summary
This study investigates how diffusion models generalize to perform discrete symbolic operations—such as modular addition—within the context of continuous image generation. Leveraging flow-matching-trained diffusion models, the work integrates mechanistic interpretability analyses, periodic representation modeling, and critical timestep detection to reveal, for the first time, a stage-wise separation between arithmetic reasoning and visual denoising in the model’s internal dynamics. The research uncovers a phenomenon of delayed generalization following overfitting, wherein the model executes modular addition either by composing periodic representations of operands or through temporally phased processing. These findings elucidate the intrinsic mechanisms by which diffusion models embed symbolic reasoning capabilities despite operating in a continuous generative framework.
📝 Abstract
Despite their empirical success, how diffusion models generalize remains poorly understood from a mechanistic perspective. We demonstrate that diffusion models trained with flow-matching objectives exhibit grokking--delayed generalization after overfitting--on modular addition, enabling controlled analysis of their internal computations. We study this phenomenon across two levels of data regime. In a single-image regime, mechanistic dissection reveals that the model implements modular addition by composing periodic representations of individual operands. In a diverse-image regime with high intraclass variability, we find that the model leverages its iterative sampling process to partition the task into an arithmetic computation phase followed by a visual denoising phase, separated by a critical timestep threshold. Our work provides the mechanistic decomposition of algorithmic learning in diffusion models, revealing how these models bridge continuous pixel-space generation and discrete symbolic reasoning.