🤖 AI Summary
Addressing the challenge of simultaneously achieving trustworthiness, interpretability, and label efficiency in deep neural networks, this paper proposes the Learnable Concept Basis Model (LCBM). LCBM operates entirely in an unsupervised setting, modeling image-level semantic concepts as Bernoulli latent variables and performing interpretable classification via local linear combinations of these concepts. Its core innovations include: (i) the first unsupervised concept discovery mechanism that identifies semantically coherent, human-aligned concepts; and (ii) a lightweight, high-fidelity concept embedding design yielding compact yet discriminative concept representations. Experiments demonstrate that LCBM significantly outperforms existing unsupervised concept-based models in generalization, while its classification accuracy approaches that of state-of-the-art black-box models. A user study further confirms that LCBM’s concepts are intuitive, highly comprehensible, and preserve rich semantic information—thereby bridging the gap between interpretability and performance without supervision.
📝 Abstract
To increase the trustworthiness of deep neural networks, it is critical to improve the understanding of how they make decisions. This paper introduces a novel unsupervised concept-based model for image classification, named Learnable Concept-Based Model (LCBM) which models concepts as random variables within a Bernoulli latent space. Unlike traditional methods that either require extensive human supervision or suffer from limited scalability, our approach employs a reduced number of concepts without sacrificing performance. We demonstrate that LCBM surpasses existing unsupervised concept-based models in generalization capability and nearly matches the performance of black-box models. The proposed concept representation enhances information retention and aligns more closely with human understanding. A user study demonstrates the discovered concepts are also more intuitive for humans to interpret. Finally, despite the use of concept embeddings, we maintain model interpretability by means of a local linear combination of concepts.