🤖 AI Summary
This study addresses the limited generalizability of existing disease trajectory prediction models, which typically rely on single-center data and struggle to perform in real-world, multi-institutional clinical settings. To overcome this, the authors propose DT-Transformer, a foundational disease trajectory model pretrained on a large-scale multicenter dataset encompassing 170 million structured electronic health records from 1.7 million patients across 11 healthcare institutions. Built upon the Transformer architecture, DT-Transformer enables next-event prediction across diverse disease categories. Evaluated on 896 disease categories stratified by age and sex, the model achieves a median AUC of 0.871, with all categories significantly outperforming random chance. Robust performance in both retrospective and prospective validations demonstrates markedly improved generalizability and clinical applicability.
📝 Abstract
Accurate disease trajectory prediction is critical for early intervention, resource allocation, and improving long-term outcomes. While electronic health records (EHRs) provide a rich longitudinal view of patient health in clinical environments, models trained on curated research cohorts may not reflect routine deployment settings, and those trained on single-hospital datasets capture only fragments of each patient's trajectory. This highlights the importance of leveraging large, multi-hospital health systems for training and validation to better reflect real-world clinical complexity. In this work, we develop DT-Transformer, a foundation model trained on 57.1M structured EHR entries over 1.7M patients from Mass General Brigham (MGB), spanning 11 hospitals and a broad network of outpatient clinics. DT-Transformer achieves strong discrimination in both held-out and prospective validation settings. Next-event prediction achieves a median age- and sex-stratified AUC of 0.871 across 896 disease categories, with all categories exceeding AUC 0.5. These results support health system-scale training as a path toward foundation models suited to real-world clinical forecasting.