🤖 AI Summary
This work addresses the diversity collapse and lineage degeneracy inherent in resampling-based sequential Monte Carlo (SMC) sampling for diffusion models. It introduces, for the first time, Fleming–Viot population dynamics into the inference phase, proposing an alignment method that operates without value function estimation or trajectory replay. The approach integrates reward-guided independent survival decisions with stochastic rejuvenation noise to design a novel birth–death mechanism, effectively preserving trajectory diversity and achieving distributional alignment while maintaining high parallelism. Experiments demonstrate that the method outperforms existing approaches by 7% on the ImageReward metric in DrawBench, achieves 14–20% FID improvements in class-conditional generation tasks, and attains inference speeds up to 66 times faster than value-based methods.
📝 Abstract
We introduce Fleming-Viot Diffusion (FVD), an inference-time alignment method that resolves the diversity collapse commonly observed in Sequential Monte Carlo (SMC) based diffusion samplers. Existing SMC-based diffusion samplers often rely on multinomial resampling or closely related resampling schemes, which can still reduce diversity and lead to lineage collapse under strong selection pressure. Inspired by Fleming-Viot population dynamics, FVD replaces multinomial resampling with a specialized birth-death mechanism designed for diffusion alignment. To handle cases where rewards are only approximately available and naive rebirth would collapse deterministic trajectories, FVD integrates independent reward-based survival decisions with stochastic rebirth noise. This yields flexible population dynamics that preserve broader trajectory support while effectively exploring reward-tilted distributions, all without requiring value function approximation or costly rollouts. FVD is fully parallelizable and scales efficiently with inference compute. Empirically, it achieves substantial gains across settings: on DrawBench it outperforms prior methods by 7% in ImageReward, while on class-conditional tasks it improves FID by roughly 14-20% over strong baselines and is up to 66 times faster than value-based approaches.