🤖 AI Summary
Intraoperative long-term vital sign forecasting faces critical challenges including the absence of standardized benchmarks, incomplete clinical data, and insufficient cross-center validation. To address these, we introduce the first standardized multi-center intraoperative vital sign prediction benchmark—comprising over 4,000 surgical cases—and define three evaluation tracks: full-observation forecasting, simulated missing-data robustness, and cross-center generalization. We propose a masked loss function that reduces reliance on preprocessing and enhances robustness to clinically realistic missingness patterns. Our approach leverages deep learning–based time-series modeling for end-to-end training and evaluation on multi-center electronic health records. This benchmark significantly improves model comparability and clinical relevance, establishing a unified evaluation platform for developing generalizable, deployment-ready intraoperative prediction models.
📝 Abstract
Intraoperative monitoring and prediction of vital signs are critical for ensuring patient safety and improving surgical outcomes. Despite recent advances in deep learning models for medical time-series forecasting, several challenges persist, including the lack of standardized benchmarks, incomplete data, and limited cross-center validation. To address these challenges, we introduce VitalBench, a novel benchmark specifically designed for intraoperative vital sign prediction. VitalBench includes data from over 4,000 surgeries across two independent medical centers, offering three evaluation tracks: complete data, incomplete data, and cross-center generalization. This framework reflects the real-world complexities of clinical practice, minimizing reliance on extensive preprocessing and incorporating masked loss techniques for robust and unbiased model evaluation. By providing a standardized and unified platform for model development and comparison, VitalBench enables researchers to focus on architectural innovation while ensuring consistency in data handling. This work lays the foundation for advancing predictive models for intraoperative vital sign forecasting, ensuring that these models are not only accurate but also robust and adaptable across diverse clinical environments. Our code and data are available at https://github.com/XiudingCai/VitalBench.