🤖 AI Summary
This work addresses the performance degradation and limited interpretability of clinical models caused by missing modalities in multimodal data. It formulates clinical diagnosis as an autoregressive sequence prediction task and introduces a missingness-aware contrastive pretraining strategy that fuses multimodal information within a shared latent space constructed by a Transformer-based causal decoder. The proposed approach effectively mitigates prediction bias induced by missing modalities and enables fine-grained interpretability analysis. Experimental results on the MIMIC-IV and eICU benchmarks demonstrate that the framework significantly outperforms existing methods and exhibits strong robustness to modality missingness across diverse patient hospitalization trajectories.
📝 Abstract
An active challenge in developing multimodal machine learning (ML) models for healthcare is handling missing modalities during training and deployment. As clinical datasets are inherently temporal and sparse in terms of modality presence, capturing the underlying predictive signal via diagnostic multimodal ML models while retaining model explainability remains an ongoing challenge. In this work, we address this by re-framing clinical diagnosis as an autoregressive sequence modeling task, utilizing causal decoders from large language models (LLMs) to model a patient's multimodal trajectory. We first introduce a missingness-aware contrastive pre-training objective that integrates multiple modalities in datasets with missingness in a shared latent space. We then show that autoregressive sequence modeling with transformer-based architectures outperforms baselines on the MIMIC-IV and eICU fine-tuning benchmarks. Finally, we use interpretability techniques to move beyond performance boosts and find that across various patient stays, removing modalities leads to divergent behavior that our contrastive pre-training mitigates. By abstracting clinical diagnosis as sequence modeling and interpreting patient stay trajectories, we develop a framework to profile and handle missing modalities while addressing the canonical desideratum of safe, transparent clinical AI.