🤖 AI Summary
To address the low posterior sampling efficiency of diffusion models in downstream tasks, this paper proposes a self-supervised fine-tuning framework based on *h*-transform estimation. The method introduces the first path-level *h*-transform estimator via iterative importance reweighting—requiring neither labeled data nor reinforcement learning signals—to recalibrate the reverse process and enable amortized conditional sampling. Through self-supervised iterative optimization, it significantly enhances conditional generation capability. Evaluated on class-conditional image generation and text-to-image reward-based fine-tuning, the approach achieves a 12% reduction in FID and a 35% decrease in sampling steps while preserving high-fidelity output. This work departs from conventional reparameterization and RL-based fine-tuning paradigms, offering a novel, efficient, and scalable strategy for adapting diffusion models to downstream applications.
📝 Abstract
Diffusion models are an important tool for generative modelling, serving as effective priors in applications such as imaging and protein design. A key challenge in applying diffusion models for downstream tasks is efficiently sampling from resulting posterior distributions, which can be addressed using the $h$-transform. This work introduces a self-supervised algorithm for fine-tuning diffusion models by estimating the $h$-transform, enabling amortised conditional sampling. Our method iteratively refines the $h$-transform using a synthetic dataset resampled with path-based importance weights. We demonstrate the effectiveness of this framework on class-conditional sampling and reward fine-tuning for text-to-image diffusion models.