🤖 AI Summary
Existing models for predicting length of stay (LOS) in acute stroke patients suffer from poor performance, limited generalizability, and neglect of system-level factors. Method: We propose a SHAP-driven multilayer stacked ensemble model—first to integrate patient-, clinical-, and system-level features (e.g., stroke unit care, care coordinator involvement)—specifically targeting prolonged hospitalization (≥9 days for ischemic, ≥11 days for hemorrhagic stroke). Feature selection employed correlation-based filtering; model evaluation used AUC and calibration curves. Results: In the ischemic stroke cohort (n=12,575), our model achieved an AUC of 0.824—significantly outperforming logistic regression (P=0.0004). It identified robust, cross-subtype interpretable predictors—including rehabilitation assessment timing, urinary incontinence, and independent ambulation capacity—demonstrating both high predictive accuracy and clinical interpretability.
📝 Abstract
Length of stay (LOS) prediction in acute stroke is critical for improving care planning. Existing machine learning models have shown suboptimal predictive performance, limited generalisability, and have overlooked system-level factors. We aimed to enhance model efficiency, performance, and interpretability by refining predictors and developing an interpretable multi-level stacking ensemble model. Data were accessed from the biennial Stroke Foundation Acute Audit (2015, 2017, 2019, 2021) in Australia. Models were developed for ischaemic and haemorrhagic stroke separately. The outcome was prolonged LOS (the LOS above the 75th percentile). Candidate predictors (ischaemic: n=89; haemorrhagic: n=83) were categorised into patient, clinical, and system domains. Feature selection with correlation-based approaches was used to refine key predictors. The evaluation of models included discrimination (AUC), calibration curves, and interpretability (SHAP plots). In ischaemic stroke (N=12,575), prolonged LOS was>=9 days, compared to>=11 days in haemorrhagic stroke (N=1,970). The ensemble model achieved superior performance [AUC: 0.824 (95% CI: 0.801-0.846)] and statistically outperformed logistic regression [AUC: 0.805 (95% CI: 0.782-0.829); P=0.0004] for ischaemic. However, the model [AUC: 0.843 (95% CI: 0.790-0.895)] did not statistically outperform logistic regression [AUC: 0.828 (95% CI: 0.774-0.882); P=0.136] for haemorrhagic. SHAP analysis identified shared predictors for both types of stroke: rehabilitation assessment, urinary incontinence, stroke unit care, inability to walk independently, physiotherapy, and stroke care coordinators involvement. An explainable ensemble model effectively predicted the prolonged LOS in ischaemic stroke. Further validation in larger cohorts is needed for haemorrhagic stroke.