Bi-Axial Transformers: Addressing the Increasing Complexity of EHR Classification

📅 2025-08-17

📈 Citations: 0

✨ Influential: 0

career value

152K/year

🤖 AI Summary

To address the challenges posed by increasingly high-dimensional, long-sequence, multimodal, and severely incomplete electronic health record (EHR) data, this paper proposes a dual-axis Transformer architecture that jointly models attention across the clinical variable and temporal dimensions. The method explicitly encodes missingness patterns and enhances robustness to sparse observations, while supporting transferable sensor embedding learning. Implemented in PyTorch, it re-implements mainstream baselines with optimized parallel computation and effective long-range dependency modeling. On the MIMIC-III sepsis prediction task, the approach achieves state-of-the-art (SOTA) performance; for in-hospital mortality classification, it matches top-performing methods. Crucially, it demonstrates significantly improved adaptability to missing data and enhanced generalization stability across diverse clinical scenarios.

Technology Category

Application Category

📝 Abstract

Electronic Health Records (EHRs), the digital representation of a patient's medical history, are a valuable resource for epidemiological and clinical research. They are also becoming increasingly complex, with recent trends indicating larger datasets, longer time series, and multi-modal integrations. Transformers, which have rapidly gained popularity due to their success in natural language processing and other domains, are well-suited to address these challenges due to their ability to model long-range dependencies and process data in parallel. But their application to EHR classification remains limited by data representations, which can reduce performance or fail to capture informative missingness. In this paper, we present the Bi-Axial Transformer (BAT), which attends to both the clinical variable and time point axes of EHR data to learn richer data relationships and address the difficulties of data sparsity. BAT achieves state-of-the-art performance on sepsis prediction and is competitive to top methods for mortality classification. In comparison to other transformers, BAT demonstrates increased robustness to data missingness, and learns unique sensor embeddings which can be used in transfer learning. Baseline models, which were previously located across multiple repositories or utilized deprecated libraries, were re-implemented with PyTorch and made available for reproduction and future benchmarking.

Problem

Research questions and friction points this paper is trying to address.

Addressing complexity in EHR classification with transformers

Improving data representation for better EHR classification performance

Enhancing robustness to data missingness in EHR analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bi-Axial Transformer for EHR data

Attends clinical and time axes

Robust to data missingness

🔎 Similar Papers

No similar papers found.