🤖 AI Summary
This work addresses the limited interpretability of existing deep learning models in medical imaging and the failure of conventional concept bottleneck models to account for structured dependencies among clinical concepts. To this end, the authors propose DCG-Net, an end-to-end interpretable framework that employs a dual cross-attention mechanism to achieve fine-grained alignment between visual features and textual concepts. Furthermore, DCG-Net introduces a parameterized concept graph, initialized with semantic priors and refined through sparse optimization, to explicitly model contextual relationships among clinical concepts. This approach represents the first integration of structured semantic priors into concept bottleneck models. Evaluated on white blood cell morphology classification and skin lesion diagnosis tasks, DCG-Net achieves state-of-the-art performance while generating clinically coherent and interpretable diagnostic rationales.
📝 Abstract
Deep learning models have achieved strong performance in medical image analysis, but their internal decision processes remain difficult to interpret. Concept Bottleneck Models (CBMs) partially address this limitation by structuring predictions through human-interpretable clinical concepts. However, existing CBMs typically overlook the contextual dependencies among concepts. To address these issues, we propose an end-to-end interpretable framework \emph{DCG-Net} that integrates multimodal alignment with structured concept reasoning. DCG-Net introduces a Dual Cross-Attention module that replaces cosine similarity matching with bidirectional attention between visual tokens and canonicalized textual concept-value prototypes, enabling spatially localized evidence attribution. To capture the relational structure inherent to clinical concepts, we develop a Parametric Concept Graph initialized with Positive Pointwise Mutual Information priors and refined through sparsity-controlled message passing. This formulation models inter-concept dependencies in a manner consistent with clinical domain knowledge. Experiments on white blood cell morphology and skin lesion diagnosis demonstrate that DCG-Net achieves state-of-the-art classification performance while producing clinically interpretable diagnostic explanations.