Sparsely Supervised Diffusion

📅 2026-02-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the issue of global structural inconsistency in diffusion models, which often arises from the locality of their denoising mechanisms. To mitigate this, the authors propose a sparsely supervised learning approach that masks up to 98% of pixels during training, retaining only critical contextual information to guide globally coherent generation. The method is efficiently implementable with minimal code modifications, fully compatible with standard denoising objectives, and substantially reduces pixel-level supervision while alleviating training instability on small datasets. Experimental results demonstrate that the proposed framework achieves competitive FID scores across multiple benchmarks, effectively suppresses overfitting and memorization, and significantly enhances the global consistency of generated images.

Technology Category

Application Category

📝 Abstract
Diffusion models have shown remarkable success across a wide range of generative tasks. However, they often suffer from spatially inconsistent generation, arguably due to the inherent locality of their denoising mechanisms. This can yield samples that are locally plausible but globally inconsistent. To mitigate this issue, we propose sparsely supervised learning for diffusion models, a simple yet effective masking strategy that can be implemented with only a few lines of code. Interestingly, the experiments show that it is safe to mask up to 98\% of pixels during diffusion model training. Our method delivers competitive FID scores across experiments and, most importantly, avoids training instability on small datasets. Moreover, the masking strategy reduces memorization and promotes the use of essential contextual information during generation.
Problem

Research questions and friction points this paper is trying to address.

diffusion models
spatial inconsistency
global coherence
denoising mechanisms
generative tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

sparsely supervised learning
diffusion models
masking strategy
spatial consistency
contextual information
🔎 Similar Papers
No similar papers found.