🤖 AI Summary
Diffusion models suffer from high computational cost and unstable gradients (e.g., explosion) in reward-guided end-to-end alignment due to long denoising chains. To address this, we propose a short-path fine-tuning framework that constructs a trajectory-preserving few-step diffusion process. By explicitly designing a shortened denoising chain, our method enables leapfrog backward propagation—significantly reducing chain length while preserving generation quality. The approach jointly integrates reward-guided gradient optimization with lightweight chain-structure modeling, thereby decoupling alignment performance from the number of denoising steps required in conventional end-to-end training. Experiments across diverse reward functions demonstrate that our method surpasses state-of-the-art approaches in alignment fidelity, achieves 2.1–3.8× higher training efficiency, and reduces gradient variance by 47%, offering improved stability and scalability.
📝 Abstract
Backpropagation-based approaches aim to align diffusion models with reward functions through end-to-end backpropagation of the reward gradient within the denoising chain, offering a promising perspective. However, due to the computational costs and the risk of gradient explosion associated with the lengthy denoising chain, existing approaches struggle to achieve complete gradient backpropagation, leading to suboptimal results. In this paper, we introduce Shortcut-based Fine-Tuning (ShortFT), an efficient fine-tuning strategy that utilizes the shorter denoising chain. More specifically, we employ the recently researched trajectory-preserving few-step diffusion model, which enables a shortcut over the original denoising chain, and construct a shortcut-based denoising chain of shorter length. The optimization on this chain notably enhances the efficiency and effectiveness of fine-tuning the foundational model. Our method has been rigorously tested and can be effectively applied to various reward functions, significantly improving alignment performance and surpassing state-of-the-art alternatives.