🤖 AI Summary
Diffusion models often face performance trade-offs in multi-objective alignment (e.g., multi-reward optimization) and multi-model composition, struggling to simultaneously satisfy all constraints while preserving consistency with pre-trained models. This paper proposes the first constraint-optimization framework tailored for diffusion models, unifying alignment and composition tasks via a bi-objective formulation: explicit reward constraints and model proximity constraints—marking the first integration of constrained learning into diffusion model fine-tuning. Theoretically, we characterize solutions provably satisfying all constraints; algorithmically, we design an efficient Lagrangian dual-based solver supporting both multi-reward alignment and multi-model fusion. Extensive image generation experiments demonstrate that our method significantly outperforms equal-weighted ensemble baselines, achieving superior generation quality and controllability while strictly adhering to user-specified constraints.
📝 Abstract
Diffusion models have become prevalent in generative modeling due to their ability to sample from complex distributions. To improve the quality of generated samples and their compliance with user requirements, two commonly used methods are: (i) Alignment, which involves fine-tuning a diffusion model to align it with a reward; and (ii) Composition, which combines several pre-trained diffusion models, each emphasizing a desirable attribute in the generated outputs. However, trade-offs often arise when optimizing for multiple rewards or combining multiple models, as they can often represent competing properties. Existing methods cannot guarantee that the resulting model faithfully generates samples with all the desired properties. To address this gap, we propose a constrained optimization framework that unifies alignment and composition of diffusion models by enforcing that the aligned model satisfies reward constraints and/or remains close to (potentially multiple) pre-trained models. We provide a theoretical characterization of the solutions to the constrained alignment and composition problems and develop a Lagrangian-based primal-dual training algorithm to approximate these solutions. Empirically, we demonstrate the effectiveness and merits of our proposed approach in image generation, applying it to alignment and composition, and show that our aligned or composed model satisfies constraints effectively, and improves on the equally-weighted approach. Our implementation can be found at https://github.com/shervinkhalafi/constrained_comp_align.