CoInD: Enabling Logical Compositions in Diffusion Models

📅 2025-03-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses a critical limitation of conditional diffusion models in generating samples satisfying arbitrary logical combinations (AND/OR/NOT) of independent attributes: their implicit reliance on attribute independence assumptions leads to poor generalization to unseen combinations—especially those involving negation (NOT)—and distorted generation under sparse training. We propose the first framework that explicitly models attribute statistical independence. Our method introduces Fisher divergence regularization to align the joint attribute distribution with the product of its marginals, integrates conditional feature disentanglement, and employs a logic-composition-driven training objective. Experiments on multi-attribute image generation demonstrate substantial improvements in logical consistency and controllability. Notably, our approach outperforms baseline methods by over 35% in both FID and logical fidelity metrics—particularly for NOT operations and sparse logical combinations—while maintaining high sample quality and precise attribute control.

Technology Category

Application Category

📝 Abstract
How can we learn generative models to sample data with arbitrary logical compositions of statistically independent attributes? The prevailing solution is to sample from distributions expressed as a composition of attributes' conditional marginal distributions under the assumption that they are statistically independent. This paper shows that standard conditional diffusion models violate this assumption, even when all attribute compositions are observed during training. And, this violation is significantly more severe when only a subset of the compositions is observed. We propose CoInD to address this problem. It explicitly enforces statistical independence between the conditional marginal distributions by minimizing Fisher's divergence between the joint and marginal distributions. The theoretical advantages of CoInD are reflected in both qualitative and quantitative experiments, demonstrating a significantly more faithful and controlled generation of samples for arbitrary logical compositions of attributes. The benefit is more pronounced for scenarios that current solutions relying on the assumption of conditionally independent marginals struggle with, namely, logical compositions involving the NOT operation and when only a subset of compositions are observed during training.
Problem

Research questions and friction points this paper is trying to address.

Learning generative models for arbitrary logical compositions of attributes
Addressing violation of statistical independence in conditional diffusion models
Improving sample generation for NOT operations and partial composition observations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Enforces statistical independence in diffusion models
Minimizes Fisher's divergence for joint distributions
Improves generation for NOT operations and partial compositions
🔎 Similar Papers
No similar papers found.