🤖 AI Summary
Clinical multimodal modeling faces persistent challenges including missing modalities, limited sample sizes, dimensional imbalance, and insufficient interpretability. To address these, we conduct the first structured review of 69 medical multimodal studies, establishing a “problem–solution” mapping framework, and propose guidelines for fusion strategy selection and an interpretability evaluation pathway. Methodologically, we integrate transfer learning, generative models, cross-modal attention mechanisms, and neural architecture search—emphasizing modality alignment and adaptive fusion. We distill five major challenge categories and their empirically validated solutions, yielding a comprehensive technical roadmap spanning medical imaging, genomics, wearable sensors, and electronic health records. This work provides both theoretical foundations and practical paradigms for designing, evaluating, and clinically deploying multimodal AI systems in healthcare.
📝 Abstract
Multimodal data modeling has emerged as a powerful approach in clinical research, enabling the integration of diverse data types such as imaging, genomics, wearable sensors, and electronic health records. Despite its potential to improve diagnostic accuracy and support personalized care, modeling such heterogeneous data presents significant technical challenges. This systematic review synthesizes findings from 69 studies to identify common obstacles, including missing modalities, limited sample sizes, dimensionality imbalance, interpretability issues, and finding the optimal fusion techniques. We highlight recent methodological advances, such as transfer learning, generative models, attention mechanisms, and neural architecture search that offer promising solutions. By mapping current trends and innovations, this review provides a comprehensive overview of the field and offers practical insights to guide future research and development in multimodal modeling for medical applications.