Towards Better Generalization and Interpretability in Unsupervised Concept-Based Models

📅 2025-06-02

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

Addressing the challenge of simultaneously achieving trustworthiness, interpretability, and label efficiency in deep neural networks, this paper proposes the Learnable Concept Basis Model (LCBM). LCBM operates entirely in an unsupervised setting, modeling image-level semantic concepts as Bernoulli latent variables and performing interpretable classification via local linear combinations of these concepts. Its core innovations include: (i) the first unsupervised concept discovery mechanism that identifies semantically coherent, human-aligned concepts; and (ii) a lightweight, high-fidelity concept embedding design yielding compact yet discriminative concept representations. Experiments demonstrate that LCBM significantly outperforms existing unsupervised concept-based models in generalization, while its classification accuracy approaches that of state-of-the-art black-box models. A user study further confirms that LCBM’s concepts are intuitive, highly comprehensible, and preserve rich semantic information—thereby bridging the gap between interpretability and performance without supervision.

Technology Category

Application Category

📝 Abstract

To increase the trustworthiness of deep neural networks, it is critical to improve the understanding of how they make decisions. This paper introduces a novel unsupervised concept-based model for image classification, named Learnable Concept-Based Model (LCBM) which models concepts as random variables within a Bernoulli latent space. Unlike traditional methods that either require extensive human supervision or suffer from limited scalability, our approach employs a reduced number of concepts without sacrificing performance. We demonstrate that LCBM surpasses existing unsupervised concept-based models in generalization capability and nearly matches the performance of black-box models. The proposed concept representation enhances information retention and aligns more closely with human understanding. A user study demonstrates the discovered concepts are also more intuitive for humans to interpret. Finally, despite the use of concept embeddings, we maintain model interpretability by means of a local linear combination of concepts.

Problem

Research questions and friction points this paper is trying to address.

Enhancing trustworthiness of neural networks via interpretable concepts

Improving generalization in unsupervised concept-based image classification

Balancing model performance with human-understandable concept representations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unsupervised concept-based model with Bernoulli latent space

Reduced concepts without performance loss

Local linear combination maintains interpretability

🔎 Similar Papers

Restyling Unsupervised Concept Based Interpretable Networks with Generative Models