๐ค AI Summary
This work addresses the challenge of identifying the temporal window during which semantic structure emerges in diffusion modelsโa key limitation for fine-grained control over the generative process. The authors propose class-conditional entropy as a reliable indicator of semantic specialization, uniquely integrating information-theoretic and statistical physics perspectives. By tracking entropy dynamics in noisy latent states, they pinpoint the critical noise interval where semantics transition from ambiguous to well-defined. Their approach combines high-dimensional Gaussian mixture modeling, entropy computation, and a quantitative analysis of information redistribution induced by guidance mechanisms. Experiments on EDM2-XS and Stable Diffusion 1.5 demonstrate that the identified entropy-decay phase aligns with the timescale of symmetry breaking, enabling temporally localized interpretation and control of semantic decision-making in diffusion trajectories.
๐ Abstract
Diffusion models do not recover semantic structure uniformly over time. Instead, samples transition from semantic ambiguity to class commitment within a narrow regime. Recent theoretical work attributes this transition to dynamical instabilities along class-separating directions, but practical methods to detect and exploit these windows in trained models are still limited. We show that tracking the class-conditional entropy of a latent semantic variable given the noisy state provides a reliable signature of these transition regimes. By restricting the entropy to semantic partitions, the entropy can furthermore resolve semantic decisions at different levels of abstraction. We analyze this behavior in high-dimensional Gaussian mixture models and show that the entropy rate concentrates on the same logarithmic time scale as the speciation symmetry-breaking instability previously identified in variance-preserving diffusion. We validate our method on EDM2-XS and Stable Diffusion 1.5, where class-conditional entropy consistently isolates the noise regimes critical for semantic structure formation. Finally, we use our framework to quantify how guidance redistributes semantic information over time. Together, these results connect information-theoretic and statistical physics perspectives on diffusion and provide a principled basis for time-localized control.