Early Prediction of Type 2 Diabetes Using Multimodal data and Tabular Transformers

📅 2026-01-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the inadequate modeling of multimodal longitudinal data in early risk prediction for type 2 diabetes mellitus (T2DM) by leveraging the Qatar Biobank cohort. It pioneers the application of the Tabular Transformer to integrate electronic health records (EHR) with dual-energy X-ray absorptiometry (DXA) measurements for T2DM prediction. By explicitly modeling temporal dependencies among long-term health indicators and employing SMOTE/SMOTE-ENN to mitigate class imbalance, the model achieves a ROC AUC of at least 79.7% across 1,382 participants, significantly outperforming conventional machine learning and state-of-the-art generative AI approaches. Key predictive features identified include visceral adipose tissue mass and lumbar spine bone mineral density, offering interpretable and personalized insights for early T2DM intervention in the Qatari population.

Technology Category

Application Category

📝 Abstract
This study introduces a novel approach for early Type 2 Diabetes Mellitus (T2DM) risk prediction using a tabular transformer (TabTrans) architecture to analyze longitudinal patient data. By processing patients` longitudinal health records and bone-related tabular data, our model captures complex, long-range dependencies in disease progression that conventional methods often overlook. We validated our TabTrans model on a retrospective Qatar BioBank (QBB) cohort of 1,382 subjects, comprising 725 men (146 diabetic, 579 healthy) and 657 women (133 diabetic, 524 healthy). The study integrated electronic health records (EHR) with dual-energy X-ray absorptiometry (DXA) data. To address class imbalance, we employed SMOTE and SMOTE-ENN resampling techniques. The proposed model`s performance is evaluated against conventional machine learning (ML) and generative AI models, including Claude 3.5 Sonnet (Anthropic`s constitutional AI), GPT-4 (OpenAI`s generative pre-trained transformer), and Gemini Pro (Google`s multimodal language model). Our TabTrans model demonstrated superior predictive performance, achieving ROC AUC $\geq$ 79.7 % for T2DM prediction compared to both generative AI models and conventional ML approaches. Feature interpretation analysis identified key risk indicators, with visceral adipose tissue (VAT) mass and volume, ward bone mineral density (BMD) and bone mineral content (BMC), T and Z-scores, and L1-L4 scores emerging as the most important predictors associated with diabetes development in Qatari adults. These findings demonstrate the significant potential of TabTrans for analyzing complex tabular healthcare data, providing a powerful tool for proactive T2DM management and personalized clinical interventions in the Qatari population. Index Terms: tabular transformers, multimodal data, DXA data, diabetes, T2DM, feature interpretation, tabular data
Problem

Research questions and friction points this paper is trying to address.

Type 2 Diabetes
early prediction
multimodal data
tabular data
longitudinal health records
Innovation

Methods, ideas, or system contributions that make the work stand out.

Tabular Transformers
Multimodal Data
Early T2DM Prediction
DXA Integration
Longitudinal Health Records
🔎 Similar Papers
No similar papers found.
Sulaiman Khan
Sulaiman Khan
College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation
Machine LearningLongitudinal AnalysisHealthcare Big Data Analytics
M
Md. Rafiul Biswas
College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation, Doha, Qatar
Z
Zubair Shah
College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation, Doha, Qatar