🤖 AI Summary
In multi-label medical image diagnosis, concurrent diseases cause semantic entanglement—leading to prototype confusion, distorted activation maps, and poor interpretability. To address this, we propose the Cross-Image and Intra-Image Prototype Learning (CIPL) framework. CIPL introduces a novel cross-image semantic decoupling mechanism for prototype learning and incorporates a dual-level alignment regularization: inter-image semantic alignment and intra-image spatial consistency constraints. This effectively disentangles co-occurring disease representations, enhancing prototype discriminability and localization reliability. Evaluated on two large-scale public multi-label benchmarks—chest X-rays and fundus images—CIPL achieves state-of-the-art classification performance. Moreover, in weakly supervised thoracic disease localization, it significantly outperforms prevailing saliency-based and prototype-based explanation methods, simultaneously delivering high classification accuracy and strong model interpretability.
📝 Abstract
Recent advances in prototypical learning have shown remarkable potential to provide useful decision interpretations associating activation maps and predictions with class-specific training prototypes. Such prototypical learning has been well-studied for various single-label diseases, but for quite relevant and more challenging multi-label diagnosis, where multiple diseases are often concurrent within an image, existing prototypical learning models struggle to obtain meaningful activation maps and effective class prototypes due to the entanglement of the multiple diseases. In this paper, we present a novel Cross- and Intra-image Prototypical Learning (CIPL) framework, for accurate multi-label disease diagnosis and interpretation from medical images. CIPL takes advantage of common cross-image semantics to disentangle the multiple diseases when learning the prototypes, allowing a comprehensive understanding of complicated pathological lesions. Furthermore, we propose a new two-level alignment-based regularisation strategy that effectively leverages consistent intra-image information to enhance interpretation robustness and predictive performance. Extensive experiments show that our CIPL attains the state-of-the-art (SOTA) classification accuracy in two public multi-label benchmarks of disease diagnosis: thoracic radiography and fundus images. Quantitative interpretability results show that CIPL also has superiority in weakly-supervised thoracic disease localisation over other leading saliency- and prototype-based explanation methods.