π€ AI Summary
This work addresses the challenge of achieving precise constrained generation under complex logical expressions in diffusion model inference. The authors propose LOGDIFF, a novel framework that, for the first time, establishes exact guidance conditions for conjunctions and disjunctions within diffusion models. By compiling Boolean logical formulas into sub-circuits exhibiting conditional independence or mutual exclusivity, LOGDIFF integrates classifier-guided and classifier-free guidance strategies. It leverages guidance signals from atomic attributes and posterior probability estimates to enable recursive, controllable generation. Experiments on both image and protein structure generation tasks demonstrate the methodβs high fidelity in satisfying complex logical constraints.
π Abstract
We propose LOGDIFF (Logical Guidance for the Exact Composition of Diffusion Models), a guidance framework for diffusion models that enables principled constrained generation with complex logical expressions at inference time. We study when exact score-based guidance for complex logical formulas can be obtained from guidance signals associated with atomic properties. First, we derive an exact Boolean calculus that provides a sufficient condition for exact logical guidance. Specifically, if a formula admits a circuit representation in which conjunctions combine conditionally independent subformulas and disjunctions combine subformulas that are either conditionally independent or mutually exclusive, exact logical guidance is achievable. In this case, the guidance signal can be computed exactly from atomic scores and posterior probabilities using an efficient recursive algorithm. Moreover, we show that, for commonly encountered classes of distributions, any desired Boolean formula is compilable into such a circuit representation. Second, by combining atomic guidance scores with posterior probability estimates, we introduce a hybrid guidance approach that bridges classifierguidance and classifier-free guidance, applicable to both compositional logical guidance and standard conditional generation. We demonstrate the effectiveness of our framework on multiple image and protein structure generation tasks.