Interpretable Generative Models through Post-hoc Concept Bottlenecks

📅 2025-03-25

📈 Citations: 0

✨ Influential: 0

career value

249K/year

🤖 AI Summary

Existing concept bottleneck model (CBM)-driven interpretable generation methods rely on expensive from-scratch training of generative models and large-scale concept annotations over real images, resulting in low efficiency and poor scalability. This paper proposes a posterior concept bottleneck framework that eliminates the need for retraining generative models or accessing concept supervision on real images. We introduce a Concept Bottleneck Autoencoder (CB-AE) and a Concept Controller (CC), enabling cross-architecture generalization (e.g., GANs and diffusion models) via latent-space projection and disentangled manipulation. Crucially, concept intervention is performed *post-hoc*—i.e., after generation—supporting zero- and few-shot concept learning. Evaluated on CelebA and other benchmarks, our method improves interpretability and controllability by 25%, accelerates training by 4–15×, and demonstrates consistent effectiveness through a large-scale user study.

Technology Category

Application Category

📝 Abstract

Concept bottleneck models (CBM) aim to produce inherently interpretable models that rely on human-understandable concepts for their predictions. However, existing approaches to design interpretable generative models based on CBMs are not yet efficient and scalable, as they require expensive generative model training from scratch as well as real images with labor-intensive concept supervision. To address these challenges, we present two novel and low-cost methods to build interpretable generative models through post-hoc techniques and we name our approaches: concept-bottleneck autoencoder (CB-AE) and concept controller (CC). Our proposed approaches enable efficient and scalable training without the need of real data and require only minimal to no concept supervision. Additionally, our methods generalize across modern generative model families including generative adversarial networks and diffusion models. We demonstrate the superior interpretability and steerability of our methods on numerous standard datasets like CelebA, CelebA-HQ, and CUB with large improvements (average ~25%) over the prior work, while being 4-15x faster to train. Finally, a large-scale user study is performed to validate the interpretability and steerability of our methods.

Problem

Research questions and friction points this paper is trying to address.

Efficient interpretable generative models lacking

High cost in training and supervision

Limited generalization across model families

Innovation

Methods, ideas, or system contributions that make the work stand out.

Post-hoc concept bottleneck autoencoder (CB-AE)

Concept controller (CC) for generative models

No real data or minimal concept supervision

🔎 Similar Papers

Restyling Unsupervised Concept Based Interpretable Networks with Generative Models