Interpretable Diffusion Models with B-cos Networks

📅 2025-07-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Text-to-image diffusion models often suffer from poor semantic alignment between prompts and generated images, with no automated mechanism for detecting such semantic failures. To address this, we propose the first intrinsically interpretable diffusion architecture, built upon B-cos neural units and a condition-driven feature disentanglement mechanism that enables precise, token-level attribution from prompt words to corresponding image pixel regions during denoising. By integrating B-cos computation with attention-aware feature decomposition, our method generates clear, verifiable semantic-spatial correspondence maps without post-hoc explanation techniques. While preserving generation fidelity, it enables fine-grained semantic editing and fully automatic semantic consistency diagnosis—capabilities previously unattainable. This advances model transparency, controllability, and trustworthiness, establishing a new paradigm for controllable generation and human-AI collaboration.

Technology Category

Application Category

📝 Abstract
Text-to-image diffusion models generate images by iteratively denoising random noise, conditioned on a prompt. While these models have enabled impressive progress in image generation, they often fail to accurately reflect all semantic information described in the prompt -- failures that are difficult to detect automatically. In this work, we introduce a diffusion model architecture built with B-cos modules that offers inherent interpretability. Our approach provides insight into how individual prompt tokens affect the generated image by producing explanations that highlight the pixel regions influenced by each token. We demonstrate that B-cos diffusion models can produce high-quality images while providing meaningful insights into prompt-image alignment.
Problem

Research questions and friction points this paper is trying to address.

Improving prompt-image alignment in diffusion models
Enhancing interpretability of text-to-image generation
Detecting semantic failures in generated images
Innovation

Methods, ideas, or system contributions that make the work stand out.

B-cos modules enable interpretable diffusion models
Token-specific pixel influence explanations provided
Maintains high-quality image generation
🔎 Similar Papers
No similar papers found.
N
Nicola Bernold
Department of Computer Science, ETH Zürich, Switzerland
Moritz Vandenhirtz
Moritz Vandenhirtz
PhD student, ETH Zurich
Generative ModelingInterpretable Machine LearningComputer VisionMedical Data Science
Alice Bizeul
Alice Bizeul
ETH Zürich
Artificial Intelligence
J
Julia E. Vogt
Department of Computer Science, ETH Zürich, Switzerland