Interpretable Diffusion Models with B-cos Networks

📅 2025-07-04

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

Text-to-image diffusion models often suffer from poor semantic alignment between prompts and generated images, with no automated mechanism for detecting such semantic failures. To address this, we propose the first intrinsically interpretable diffusion architecture, built upon B-cos neural units and a condition-driven feature disentanglement mechanism that enables precise, token-level attribution from prompt words to corresponding image pixel regions during denoising. By integrating B-cos computation with attention-aware feature decomposition, our method generates clear, verifiable semantic-spatial correspondence maps without post-hoc explanation techniques. While preserving generation fidelity, it enables fine-grained semantic editing and fully automatic semantic consistency diagnosis—capabilities previously unattainable. This advances model transparency, controllability, and trustworthiness, establishing a new paradigm for controllable generation and human-AI collaboration.

Technology Category

Application Category

📝 Abstract

Text-to-image diffusion models generate images by iteratively denoising random noise, conditioned on a prompt. While these models have enabled impressive progress in image generation, they often fail to accurately reflect all semantic information described in the prompt -- failures that are difficult to detect automatically. In this work, we introduce a diffusion model architecture built with B-cos modules that offers inherent interpretability. Our approach provides insight into how individual prompt tokens affect the generated image by producing explanations that highlight the pixel regions influenced by each token. We demonstrate that B-cos diffusion models can produce high-quality images while providing meaningful insights into prompt-image alignment.

Problem

Research questions and friction points this paper is trying to address.

Improving prompt-image alignment in diffusion models

Enhancing interpretability of text-to-image generation

Detecting semantic failures in generated images

Innovation

Methods, ideas, or system contributions that make the work stand out.

B-cos modules enable interpretable diffusion models

Token-specific pixel influence explanations provided

Maintains high-quality image generation

🔎 Similar Papers

No similar papers found.