🤖 AI Summary
Medical imaging cross-modal generalization faces dual challenges: modality heterogeneity and inter-individual anatomical/physiological variability (e.g., organ size, metabolic rate). Existing methods predominantly ignore subject-specific characteristics, modeling only shared anatomical patterns—thereby limiting generalization capability. To address this, we propose the “subject-specific invariant representation” (Xₕ) paradigm, establishing personalized modeling as a fundamental pathway for multimodal medical generalization. Our approach jointly optimizes subject-level constraints, learns biologically grounded prior embeddings, and enforces cross-modal invariance in representation learning—enabling robust adaptation to heterogeneous modalities and diverse populations. We validate the framework across segmentation, detection, and registration tasks. Results show an average 4.2% improvement in mDice and a 37% reduction in cross-device and cross-center migration error, demonstrating substantial gains in generalizability and clinical deployability.
📝 Abstract
The differences among medical imaging modalities, driven by distinct underlying principles, pose significant challenges for generalization in multi-modal medical tasks. Beyond modality gaps, individual variations, such as differences in organ size and metabolic rate, further impede a model's ability to generalize effectively across both modalities and diverse populations. Despite the importance of personalization, existing approaches to multi-modal generalization often neglect individual differences, focusing solely on common anatomical features. This limitation may result in weakened generalization in various medical tasks. In this paper, we unveil that personalization is critical for multi-modal generalization. Specifically, we propose an approach to achieve personalized generalization through approximating the underlying personalized invariant representation ${X}_h$ across various modalities by leveraging individual-level constraints and a learnable biological prior. We validate the feasibility and benefits of learning a personalized ${X}_h$, showing that this representation is highly generalizable and transferable across various multi-modal medical tasks. Extensive experimental results consistently show that the additionally incorporated personalization significantly improves performance and generalization across diverse scenarios, confirming its effectiveness.