Information Theoretic Learning for Diffusion Models with Warm Start

📅 2025-10-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Diffusion models suffer from slow convergence in maximum-likelihood learning and lack of rigorous theoretical foundations. To address this, we propose a unified information-theoretic learning framework. Our key innovation extends the classical relationship between KL divergence and Fisher information to arbitrary—non-Gaussian and structured—noise perturbations, yielding a tighter upper bound on the negative log-likelihood (NLL) and an interpretable entropy-mismatch bound. The method seamlessly integrates with standard diffusion training pipelines, supports both continuous and discrete data modeling, and formalizes the diffusion process as a generalized communication channel. Experiments demonstrate competitive NLL on CIFAR-10, state-of-the-art performance on multi-resolution ImageNet generation without data augmentation, and strong generalization to discrete data domains.

Technology Category

Application Category

📝 Abstract
Generative models that maximize model likelihood have gained traction in many practical settings. Among them, perturbation based approaches underpin many strong likelihood estimation models, yet they often face slow convergence and limited theoretical understanding. In this paper, we derive a tighter likelihood bound for noise driven models to improve both the accuracy and efficiency of maximum likelihood learning. Our key insight extends the classical KL divergence Fisher information relationship to arbitrary noise perturbations, going beyond the Gaussian assumption and enabling structured noise distributions. This formulation allows flexible use of randomized noise distributions that naturally account for sensor artifacts, quantization effects, and data distribution smoothing, while remaining compatible with standard diffusion training. Treating the diffusion process as a Gaussian channel, we further express the mismatched entropy between data and model, showing that the proposed objective upper bounds the negative log-likelihood (NLL). In experiments, our models achieve competitive NLL on CIFAR-10 and SOTA results on ImageNet across multiple resolutions, all without data augmentation, and the framework extends naturally to discrete data.
Problem

Research questions and friction points this paper is trying to address.

Improving accuracy and efficiency of maximum likelihood learning in diffusion models
Extending KL divergence Fisher information beyond Gaussian noise assumptions
Enabling flexible noise distributions for sensor artifacts and quantization effects
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends KL divergence to arbitrary noise perturbations
Uses flexible randomized noise distributions for artifacts
Treats diffusion process as Gaussian channel
🔎 Similar Papers
No similar papers found.
Y
Yirong Shen
Imperial College London
L
Lu Gan
Brunel University of London
Cong Ling
Cong Ling
Imperial College London
Coding and Crypto