🤖 AI Summary
To address modeling challenges posed by long sequences, high sparsity, and pervasive missingness in electronic health records (EHRs), this paper proposes a Mamba-Transformer hybrid architecture. The model leverages Mamba’s efficient linear-complexity state-space modeling to capture longitudinal temporal dependencies over ultra-long contexts, while incorporating a channel-wise multi-head self-attention mechanism to model cross-variable feature interactions. Additionally, an interpretable module is designed to identify clinically salient decision-relevant features. This framework unifies sequence- and channel-level representation learning within a single architecture. Evaluated on multiple real-world EHR datasets, it achieves significant improvements in critical prediction tasks—including in-hospital mortality and sepsis onset—with average AUC gains of 2.1–4.7 percentage points. The architecture demonstrates computational efficiency, strong scalability, and practical clinical deployability.
📝 Abstract
Electronic health Records (EHRs) have become a cornerstone in modern-day healthcare. They are a crucial part for analyzing the progression of patient health; however, their complexity, characterized by long, multivariate sequences, sparsity, and missing values poses significant challenges in traditional deep learning modeling. While Transformer-based models have demonstrated success in modeling EHR data and predicting clinical outcomes, their quadratic computational complexity and limited context length hinder their efficiency and practical applications. On the other hand, State Space Models (SSMs) like Mamba present a promising alternative offering linear-time sequence modeling and improved efficiency for handling long sequences, but focus mostly on mixing sequence-level information rather than channel-level data. To overcome these challenges, we propose HyMaTE (A Hybrid Mamba and Transformer Model for EHR Representation Learning), a novel hybrid model tailored for representing longitudinal data, combining the strengths of SSMs with advanced attention mechanisms. By testing the model on predictive tasks on multiple clinical datasets, we demonstrate HyMaTE's ability to capture an effective, richer, and more nuanced unified representation of EHR data. Additionally, the interpretability of the outcomes achieved by self-attention illustrates the effectiveness of our model as a scalable and generalizable solution for real-world healthcare applications. Codes are available at: https://github.com/healthylaife/HyMaTE.