🤖 AI Summary
This study addresses key challenges in medical multimodal fusion—namely, difficulty integrating medical imaging and electronic health records (EHR), coarse-grained cross-modal interaction, and limited interpretability—by proposing a hypernetwork-based conditional fusion framework. The method employs a dual-stream encoder architecture, where a hypernetwork dynamically generates parameters for the MRI encoder conditioned on EHR tabular features, enabling fine-grained, interpretable cross-modal modulation of visual representations. The model is trained end-to-end and achieves statistically significant improvements over unimodal baselines and state-of-the-art fusion approaches on both brain age prediction and Alzheimer’s disease multi-class classification, demonstrating strong generalizability and robustness. Its core contribution lies in being the first to introduce hypernetworks into healthcare multimodal fusion, thereby enabling EHR-conditioned dynamic learning of visual representations.
📝 Abstract
The integration of diverse clinical modalities such as medical imaging and the tabular data extracted from patients' Electronic Health Records (EHRs) is a crucial aspect of modern healthcare. Integrative analysis of multiple sources can provide a comprehensive understanding of the clinical condition of a patient, improving diagnosis and treatment decision. Deep Neural Networks (DNNs) consistently demonstrate outstanding performance in a wide range of multimodal tasks in the medical domain. However, the complex endeavor of effectively merging medical imaging with clinical, demographic and genetic information represented as numerical tabular data remains a highly active and ongoing research pursuit. We present a novel framework based on hypernetworks to fuse clinical imaging and tabular data by conditioning the image processing on the EHR's values and measurements. This approach aims to leverage the complementary information present in these modalities to enhance the accuracy of various medical applications. We demonstrate the strength and generality of our method on two different brain Magnetic Resonance Imaging (MRI) analysis tasks, namely, brain age prediction conditioned by subject's sex and multi-class Alzheimer's Disease (AD) classification conditioned by tabular data. We show that our framework outperforms both single-modality models and state-of-the-art MRI tabular data fusion methods. A link to our code can be found at https://github.com/daniel4725/HyperFusion