🤖 AI Summary
Imaging modality differences—such as vendor-specific hardware and acquisition parameter variations—induce substantial domain shift in medical imaging, severely hindering unsupervised discovery of biological phenotypes. To address this, we propose a label-free latent-space rotation disentanglement framework: an autoencoder learns image representations, followed by post-hoc optimization via a learnable orthogonal rotation matrix that geometrically separates technical variation from biological variation in the latent space. Evaluated on multi-center clinical data, our method significantly improves cross-device clustering stability (ARI +19.01%, NMI +16.85%, Dice +12.39%), outperforming four state-of-the-art batch-effect correction methods. It also enhances survival risk stratification for idiopathic pulmonary fibrosis patients. The core contribution lies in achieving unsupervised disentanglement of biological and technical factors through geometrically constrained latent-space rotation—establishing a novel paradigm for interpretable representation learning in heterogeneous medical imaging.
📝 Abstract
Identifying new disease-related patterns in medical imaging data with the help of machine learning enlarges the vocabulary of recognizable findings. This supports diagnostic and prognostic assessment. However, image appearance varies not only due to biological differences, but also due to imaging technology linked to vendors, scanning- or re- construction parameters. The resulting domain shifts impedes data representation learning strategies and the discovery of biologically meaningful cluster appearances. To address these challenges, we introduce an approach to actively learn the domain shift via post-hoc rotation of the data latent space, enabling disentanglement of biological and technical factors. Results on real-world heterogeneous clinical data showcase that the learned disentangled representation leads to stable clusters representing tissue-types across different acquisition settings. Cluster consistency is improved by +19.01% (ARI), +16.85% (NMI), and +12.39% (Dice) compared to the entangled representation, outperforming four state-of-the-art harmonization methods. When using the clusters to quantify tissue composition on idiopathic pulmonary fibrosis patients, the learned profiles enhance Cox survival prediction. This indicates that the proposed label-free framework facilitates biomarker discovery in multi-center routine imaging data. Code is available on GitHub https://github.com/cirmuw/latent-space-rotation-disentanglement.