🤖 AI Summary
Diffusion models often produce images with missing details and structural distortions when operating under low sampling steps (e.g., 30) and weak or classifier-free guidance. To address this, we propose Historical Guidance Sampling (HiGS), a plug-and-play sampling enhancement method. HiGS introduces historical prediction differences into a momentum-based mechanism, constructing residual guidance via weighted aggregation of historical predictions and the current model output—requiring neither model fine-tuning nor architectural modification. Compatible with diverse diffusion models and samplers, HiGS achieves state-of-the-art performance on ImageNet 256×256 unconditional generation: using only 30 sampling steps, it attains an FID of 1.61—the best reported result at the time. This demonstrates substantial improvements in visual fidelity, structural realism, and inference efficiency.
📝 Abstract
While diffusion models have made remarkable progress in image generation, their outputs can still appear unrealistic and lack fine details, especially when using fewer number of neural function evaluations (NFEs) or lower guidance scales. To address this issue, we propose a novel momentum-based sampling technique, termed history-guided sampling (HiGS), which enhances quality and efficiency of diffusion sampling by integrating recent model predictions into each inference step. Specifically, HiGS leverages the difference between the current prediction and a weighted average of past predictions to steer the sampling process toward more realistic outputs with better details and structure. Our approach introduces practically no additional computation and integrates seamlessly into existing diffusion frameworks, requiring neither extra training nor fine-tuning. Extensive experiments show that HiGS consistently improves image quality across diverse models and architectures and under varying sampling budgets and guidance scales. Moreover, using a pretrained SiT model, HiGS achieves a new state-of-the-art FID of 1.61 for unguided ImageNet generation at 256$ imes$256 with only 30 sampling steps (instead of the standard 250). We thus present HiGS as a plug-and-play enhancement to standard diffusion sampling that enables faster generation with higher fidelity.