🤖 AI Summary
Breast cancer molecular subtyping faces challenges due to heterogeneous availability of multimodal data—including copy number variation, clinical records, and whole-slide images (WSIs)—and dynamically varying modality combinations. Method: We propose a loosely coupled, scalable multimodal learning framework featuring a dual-branch WSI representation module that jointly leverages CNNs and graph neural networks to capture local and structural features, alongside a modality-agnostic alignment-attention fusion mechanism enabling dynamic addition or removal of modalities without retraining. Contribution/Results: To our knowledge, this is the first end-to-end multimodal fusion framework achieving true architectural decoupling. It significantly outperforms state-of-the-art methods on TCGA and multi-institutional real-world clinical datasets. The model demonstrates strong generalizability—seamlessly transferring to other cancer types—and offers both clinical deployment flexibility and computational efficiency.
📝 Abstract
Healthcare applications are inherently multimodal, benefiting greatly from the integration of diverse data sources. However, the modalities available in clinical settings can vary across different locations and patients. A key area that stands to gain from multimodal integration is breast cancer molecular subtyping, an important clinical task that can facilitate personalized treatment and improve patient prognosis. In this work, we propose a scalable and loosely-coupled multimodal framework that seamlessly integrates data from various modalities, including copy number variation (CNV), clinical records, and histopathology images, to enhance breast cancer subtyping. While our primary focus is on breast cancer, our framework is designed to easily accommodate additional modalities, offering the flexibility to scale up or down with minimal overhead without requiring re-training of existing modalities, making it applicable to other types of cancers as well. We introduce a dual-based representation for whole slide images (WSIs), combining traditional image-based and graph-based WSI representations. This novel dual approach results in significant performance improvements. Moreover, we present a new multimodal fusion strategy, demonstrating its ability to enhance performance across a range of multimodal conditions. Our comprehensive results show that integrating our dual-based WSI representation with CNV and clinical health records, along with our pipeline and fusion strategy, outperforms state-of-the-art methods in breast cancer subtyping.