🤖 AI Summary
Fine-grained anatomical structures (e.g., cardiac chambers, interventricular septum) in echocardiographic images are challenging to identify and disentangle in an unsupervised setting. Method: We propose the first self-supervised concept-style disentanglement pretraining framework for echocardiography, built upon a variational autoencoder. It introduces a novel discrete latent space with a predefined number of concepts, jointly optimized via concept discretization loss, style orthogonality constraints, and local consistency reconstruction loss, augmented with ultrasound-specific data augmentation. Contribution/Results: The method achieves interpretable disentanglement of anatomical concepts and local stylistic variations, enabling concept-level retrieval, out-of-distribution detection, and controllable image synthesis. It consistently outperforms state-of-the-art self-supervised methods on region retrieval, segmentation, OOD detection, and object detection—accurately localizing critical anatomical structures and generating high-fidelity synthetic images that preserve anatomical concepts while swapping styles.
📝 Abstract
While traditional self-supervised learning methods improve performance and robustness across various medical tasks, they rely on single-vector embeddings that may not capture fine-grained concepts such as anatomical structures or organs. The ability to identify such concepts and their characteristics without supervision has the potential to improve pre-training methods, and enable novel applications such as fine-grained image retrieval and concept-based outlier detection. In this paper, we introduce ConceptVAE, a novel pre-training framework that detects and disentangles fine-grained concepts from their style characteristics in a self-supervised manner. We present a suite of loss terms and model architecture primitives designed to discretise input data into a preset number of concepts along with their local style. We validate ConceptVAE both qualitatively and quantitatively, demonstrating its ability to detect fine-grained anatomical structures such as blood pools and septum walls from 2D cardiac echocardiographies. Quantitatively, ConceptVAE outperforms traditional self-supervised methods in tasks such as region-based instance retrieval, semantic segmentation, out-of-distribution detection, and object detection. Additionally, we explore the generation of in-distribution synthetic data that maintains the same concepts as the training data but with distinct styles, highlighting its potential for more calibrated data generation. Overall, our study introduces and validates a promising new pre-training technique based on concept-style disentanglement, opening multiple avenues for developing models for medical image analysis that are more interpretable and explainable than black-box approaches.