Machine Learning and Statistical Insights into Hospital Stay Durations: The Italian EHR Case

📅 2025-04-25

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

This study aims to identify key clinical and administrative determinants of length of stay (LoS) in hospitals across Italy’s Piedmont region to support healthcare quality assessment and resource optimization. Method: Leveraging electronic health record (EHR) data from over 60 hospitals (2020–2023), we integrate statistical analysis with machine learning—specifically, temporal feature engineering, clinical semantic encoding (e.g., ICD-10-based comorbidity scoring), and tree-based modeling—to develop a robust LoS prediction framework within a real-world regional EHR setting. Contribution/Results: The CatBoost model achieves an R² of 0.49, significantly outperforming baseline models. Key non-linear predictors include age group, Charlson Comorbidity Index, admission type, and seasonal (monthly) effects. This work introduces the first large-scale, region-wide, EHR-driven multimethod analytical framework for LoS prediction in Italy, delivering interpretable, policy-relevant insights grounded in empirical evidence.

Technology Category

Application Category

📝 Abstract

Length of hospital stay is a critical metric for assessing healthcare quality and optimizing hospital resource management. This study aims to identify factors influencing LoS within the Italian healthcare context, using a dataset of hospitalization records from over 60 healthcare facilities in the Piedmont region, spanning from 2020 to 2023. We explored a variety of features, including patient characteristics, comorbidities, admission details, and hospital-specific factors. Significant correlations were found between LoS and features such as age group, comorbidity score, admission type, and the month of admission. Machine learning models, specifically CatBoost and Random Forest, were used to predict LoS. The highest R2 score, 0.49, was achieved with CatBoost, demonstrating good predictive performance.

Problem

Research questions and friction points this paper is trying to address.

Identify factors influencing hospital stay duration in Italy

Predict length of stay using patient and hospital data

Evaluate machine learning models for healthcare resource optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Used CatBoost and Random Forest models

Analyzed patient and hospital-specific features

Achieved R2 score of 0.49 with CatBoost

🔎 Similar Papers

No similar papers found.