Optimization Benchmark for Diffusion Models on Dynamical Systems

📅 2025-10-22

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

This study addresses optimization challenges in training denoising flow trajectory diffusion models. We systematically benchmark Muon, SOAP, AdamW, and SGD across diverse learning rate schedules. Notably, Muon and SOAP are introduced to diffusion model optimization for the first time, revealing distinct advantages in dynamical modeling: both reduce final loss by 18% on average over AdamW, achieve faster convergence, and exhibit superior stability. We further observe a marked narrowing of the performance gap between conventional Adam and SGD—contrary to typical deep learning settings—and reassess the impact of learning rate scheduling. Experiments employ a standard denoising flow diffusion architecture, with multidimensional analysis covering training dynamics, convergence behavior, and generalization performance. Our work establishes a novel optimization paradigm for diffusion models and provides empirical evidence supporting more efficient and robust training strategies.

Technology Category

Application Category

📝 Abstract

The training of diffusion models is often absent in the evaluation of new optimization techniques. In this work, we benchmark recent optimization algorithms for training a diffusion model for denoising flow trajectories. We observe that Muon and SOAP are highly efficient alternatives to AdamW (18% lower final loss). We also revisit several recent phenomena related to the training of models for text or image applications in the context of diffusion model training. This includes the impact of the learning-rate schedule on the training dynamics, and the performance gap between Adam and SGD.

Problem

Research questions and friction points this paper is trying to address.

Benchmarking optimization algorithms for diffusion model training

Evaluating efficient alternatives to AdamW for denoising trajectories

Analyzing learning-rate impacts and Adam-SGD performance gaps

Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmarking optimization algorithms for diffusion model training

Evaluating Muon and SOAP as efficient AdamW alternatives

Revisiting learning-rate impacts on diffusion training dynamics

🔎 Similar Papers

Operator-informed score matching for Markov diffusion models