🤖 AI Summary
Traditional multimodal integration methods struggle to model high-dimensional, nonlinear structures and fail to disentangle joint from modality-specific variations. To address this, we propose DeepJIVE—a deep learning-based multimodal decomposition framework that explicitly separates shared (joint) and individual (modality-specific) representations in a unified, nonlinear architecture. DeepJIVE incorporates explicit identity constraints and orthogonality regularization, coupled with customizable loss functions, to ensure faithful representation learning. It supports 1D–3D data (e.g., PET and MRI), and is validated on both synthetic benchmarks and the ADNI cohort. Results demonstrate its ability to uncover biologically interpretable co-variation patterns between amyloid PET and structural MRI—outperforming existing linear and shallow models significantly. The framework is open-source and exhibits broad applicability for cross-modal medical image analysis.
📝 Abstract
Conventional multimodal data integration methods provide a comprehensive assessment of the shared or unique structure within each individual data type but suffer from several limitations such as the inability to handle high-dimensional data and identify nonlinear structures. In this paper, we introduce DeepJIVE, a deep-learning approach to performing Joint and Individual Variance Explained (JIVE). We perform mathematical derivation and experimental validations using both synthetic and real-world 1D, 2D, and 3D datasets. Different strategies of achieving the identity and orthogonality constraints for DeepJIVE were explored, resulting in three viable loss functions. We found that DeepJIVE can successfully uncover joint and individual variations of multimodal datasets. Our application of DeepJIVE to the Alzheimer's Disease Neuroimaging Initiative (ADNI) also identified biologically plausible covariation patterns between the amyloid positron emission tomography (PET) and magnetic resonance (MR) images. In conclusion, the proposed DeepJIVE can be a useful tool for multimodal data analysis.