JEDI: The Force of Jensen-Shannon Divergence in Disentangling Diffusion Models

📅 2025-05-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of subject disentanglement and compositional alignment in diffusion models at inference time, this paper proposes JEDI—a test-time adaptive disentanglement method that requires no retraining and operates without external supervision. Methodologically, JEDI introduces (1) a novel attention disentanglement objective based on Jensen–Shannon divergence to explicitly minimize cross-subject semantic entanglement; (2) a lightweight, CLIP-free disentanglement scoring mechanism; and (3) an efficient, model-agnostic optimization via adversarial gradient updates. Evaluated on Stable Diffusion 1.5 and 3.5, JEDI significantly improves prompt alignment fidelity and compositional disentanglement quality—enabling plug-and-play deployment. Quantitative and qualitative results demonstrate consistent gains across diverse compositional generation tasks, with ablation studies confirming the efficacy of each design component. The approach is computationally efficient and generalizes across architectures without architectural modification. Code will be publicly released.

Technology Category

Application Category

📝 Abstract
We introduce JEDI, a test-time adaptation method that enhances subject separation and compositional alignment in diffusion models without requiring retraining or external supervision. JEDI operates by minimizing semantic entanglement in attention maps using a novel Jensen-Shannon divergence based objective. To improve efficiency, we leverage adversarial optimization, reducing the number of updating steps required. JEDI is model-agnostic and applicable to architectures such as Stable Diffusion 1.5 and 3.5, consistently improving prompt alignment and disentanglement in complex scenes. Additionally, JEDI provides a lightweight, CLIP-free disentanglement score derived from internal attention distributions, offering a principled benchmark for compositional alignment under test-time conditions. We will publicly release the implementation of our method.
Problem

Research questions and friction points this paper is trying to address.

Enhances subject separation in diffusion models
Minimizes semantic entanglement in attention maps
Improves prompt alignment in complex scenes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Minimizes semantic entanglement via Jensen-Shannon divergence
Uses adversarial optimization for efficiency
Provides lightweight CLIP-free disentanglement score
🔎 Similar Papers
No similar papers found.
E
Eric Tillmann Bill
ETH Zurich
Enis Simsar
Enis Simsar
ETH Zurich
Computer Vision
T
Thomas Hofmann