🤖 AI Summary
Multi-center clinical AI systems suffer from model output drift due to inter-site heterogeneity in patient demographics, imaging equipment, and acquisition protocols, leading to performance degradation; conventional centralized monitoring fails to capture site-specific drift dynamics. This paper proposes a distributed agent-based drift detection framework: lightweight site-local agents perform real-time, label-free, adaptive output drift identification and severity quantification. We introduce a site-aware monitoring paradigm featuring a novel dynamic reference distribution construction mechanism—requiring no historical data—and jointly leverage KL divergence and Wasserstein distance for batch-level distribution comparison. Drift severity is assessed via a hybrid binary classification and multi-level regression scheme. Evaluated on a multi-center task of predicting pathological complete response in breast cancer, our method achieves a 10.3% absolute improvement in drift detection F1-score (reaching 74.3%) and attains 83.7% F1 for severity classification—significantly outperforming centralized baselines.
📝 Abstract
Modern clinical decision support systems can concurrently serve multiple, independent medical imaging institutions, but their predictive performance may degrade across sites due to variations in patient populations, imaging hardware, and acquisition protocols. Continuous surveillance of predictive model outputs offers a safe and reliable approach for identifying such distributional shifts without ground truth labels. However, most existing methods rely on centralized monitoring of aggregated predictions, overlooking site-specific drift dynamics. We propose an agent-based framework for detecting drift and assessing its severity in multisite clinical AI systems. To evaluate its effectiveness, we simulate a multi-center environment for output-based drift detection, assigning each site a drift monitoring agent that performs batch-wise comparisons of model outputs against a reference distribution. We analyse several multi-center monitoring schemes, that differ in how the reference is obtained (site-specific, global, production-only and adaptive), alongside a centralized baseline. Results on real-world breast cancer imaging data using a pathological complete response prediction model shows that all multi-center schemes outperform centralized monitoring, with F1-score improvements up to 10.3% in drift detection. In the absence of site-specific references, the adaptive scheme performs best, with F1-scores of 74.3% for drift detection and 83.7% for drift severity classification. These findings suggest that adaptive, site-aware agent-based drift monitoring can enhance reliability of multisite clinical decision support systems.