🤖 AI Summary
Existing self-supervised multimodal methods suffer from rigid tabular modeling and strong dependence on specific cohorts, hindering the learning of transferable, cross-center medical knowledge. To address this, we propose CITab, a cross-cohort self-supervised framework featuring two key innovations: (1) semantic-aware tabular encoding—integrating column-header semantics into table representation—and (2) prototype-guided mixture-of-linear-layers (P-MoLin), enabling decoupled joint representation learning of tabular and imaging data. CITab employs joint pretraining via contrastive learning and masked tabular reconstruction, facilitating both cross-cohort knowledge transfer and disentangled learning of medical concepts. Evaluated on Alzheimer’s disease diagnosis across three public cohorts (4,461 subjects), CITab significantly outperforms state-of-the-art methods, demonstrating superior generalizability and clinical applicability.
📝 Abstract
Multi-modal learning integrating medical images and tabular data has significantly advanced clinical decision-making in recent years. Self-Supervised Learning (SSL) has emerged as a powerful paradigm for pretraining these models on large-scale unlabeled image-tabular data, aiming to learn discriminative representations. However, existing SSL methods for image-tabular representation learning are often confined to specific data cohorts, mainly due to their rigid tabular modeling mechanisms when modeling heterogeneous tabular data. This inter-tabular barrier hinders the multi-modal SSL methods from effectively learning transferrable medical knowledge shared across diverse cohorts. In this paper, we propose a novel SSL framework, namely CITab, designed to learn powerful multi-modal feature representations in a cross-tabular manner. We design the tabular modeling mechanism from a semantic-awareness perspective by integrating column headers as semantic cues, which facilitates transferrable knowledge learning and the scalability in utilizing multiple data sources for pretraining. Additionally, we propose a prototype-guided mixture-of-linear layer (P-MoLin) module for tabular feature specialization, empowering the model to effectively handle the heterogeneity of tabular data and explore the underlying medical concepts. We conduct comprehensive evaluations on Alzheimer's disease diagnosis task across three publicly available data cohorts containing 4,461 subjects. Experimental results demonstrate that CITab outperforms state-of-the-art approaches, paving the way for effective and scalable cross-tabular multi-modal learning.