🤖 AI Summary
To address the prevalent reward collapse problem in diffusion model fine-tuning, this paper proposes an entropy-regularized stochastic control framework and— for the first time—rigorously extends it to general *f*-divergence regularization. Methodologically, we formulate a continuous-time stochastic control model, integrating Itô calculus with variational inference to derive a computationally tractable and provably convergent optimal control policy. Theoretically, we establish that the proposed regularization effectively mitigates reward collapse; empirically, it significantly improves both sample quality and diversity. Key contributions include: (1) the first rigorous stochastic control analysis framework specifically designed for diffusion model fine-tuning; (2) a unified generalization of entropy regularization to arbitrary *f*-divergences, substantially enhancing methodological generality and robustness; and (3) a practical fine-tuning paradigm implementable under multiple divergence metrics.
📝 Abstract
This paper aims to develop and provide a rigorous treatment to the problem of entropy regularized fine-tuning in the context of continuous-time diffusion models, which was recently proposed by Uehara et al. (arXiv:2402.15194, 2024). The idea is to use stochastic control for sample generation, where the entropy regularizer is introduced to mitigate reward collapse. We also show how the analysis can be extended to fine-tuning involving a general $f$-divergence regularizer.