BlendFusion -- Scalable Synthetic Data Generation for Diffusion Model Training

📅 2026-04-10

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This work addresses model autophagy disorder (MAD)—a training instability in pure diffusion models caused by visual inconsistencies in generated images—by proposing BlendFusion, a novel framework that integrates path-traced rendering with an object-centric camera placement strategy to construct FineBLEND, a high-quality synthetic text-image dataset. BlendFusion incorporates robust image filtering, automated quality assessment, and aligned text-image annotation mechanisms, enabling highly configurable and scalable 3D data generation. Experimental results demonstrate that FineBLEND significantly outperforms existing mainstream text-image datasets in both image fidelity and training efficacy, effectively mitigating MAD and validating the superiority of object-aware sampling strategies.

Technology Category

Application Category

📝 Abstract

With the rapid adoption of diffusion models, synthetic data generation has emerged as a promising approach for addressing the growing demand for large-scale image datasets. However, images generated purely by diffusion models often exhibit visual inconsistencies, and training models on such data can create an autophagous feedback loop that leads to model collapse, commonly referred to as Model Autophagy Disorder (MAD). To address these challenges, we propose BlendFusion, a scalable framework for synthetic data generation from 3D scenes using path tracing. Our pipeline incorporates an object-centric camera placement strategy, robust filtering mechanisms, and automatic captioning to produce high-quality image-caption pairs. Using this pipeline, we curate FineBLEND, an image-caption dataset constructed from a diverse set of 3D scenes. We empirically analyze the quality of FineBLEND and compare it to several widely used image-caption datasets. We also demonstrate the effectiveness of our object-centric camera placement strategy relative to object-agnostic sampling approaches. Our open-source framework is designed for high configurability, enabling the community to create their own datasets from 3D scenes.

Problem

Research questions and friction points this paper is trying to address.

synthetic data generation

diffusion models

model collapse

Model Autophagy Disorder

visual inconsistencies

Innovation

Methods, ideas, or system contributions that make the work stand out.

synthetic data generation

diffusion models

object-centric camera placement