🤖 AI Summary
Supervised contrastive learning and VAE-based methods for brain MRI lesion detection suffer from limited clinical applicability due to severe scarcity of annotated lesion samples. Method: We propose the first fully unsupervised contrastive analysis framework that leverages only healthy brain MRI data. It employs a self-supervised contrastive encoder to learn anatomically invariant representations, which then condition a diffusion model to generate high-fidelity “purely healthy” reconstructions. Crucially, we couple contrastive representation learning with conditional diffusion generation by imposing structural constraints in the latent space and integrating data augmentation for target-pattern approximation. Results: Evaluated on facial images and three independent brain MRI datasets, our method achieves significantly improved reconstruction fidelity and boosts abnormality classification accuracy by 3.2–5.8%. Moreover, it enables high-precision, interpretable anomaly localization without requiring any lesion annotations.
📝 Abstract
Contrastive Analysis (CA) regards the problem of identifying patterns in images that allow distinguishing between a background (BG) dataset (i.e. healthy subjects) and a target (TG) dataset (i.e. unhealthy subjects). Recent works on this topic rely on variational autoencoders (VAE) or contrastive learning strategies to learn the patterns that separate TG samples from BG samples in a supervised manner. However, the dependency on target (unhealthy) samples can be challenging in medical scenarios due to their limited availability. Also, the blurred reconstructions of VAEs lack utility and interpretability. In this work, we redefine the CA task by employing a self-supervised contrastive encoder to learn a latent representation encoding only common patterns from input images, using samples exclusively from the BG dataset during training, and approximating the distribution of the target patterns by leveraging data augmentation techniques. Subsequently, we exploit state-of-the-art generative methods, i.e. diffusion models, conditioned on the learned latent representation to produce a realistic (healthy) version of the input image encoding solely the common patterns. Thorough validation on a facial image dataset and experiments across three brain MRI datasets demonstrate that conditioning the generative process of state-of-the-art generative methods with the latent representation from our self-supervised contrastive encoder yields improvements in the generated image quality and in the accuracy of image classification. The code is available at https://github.com/CristianoPatricio/unsupervised-contrastive-cond-diff.