🤖 AI Summary
For large-scale online prediction scenarios where ground-truth labels are delayed or unavailable, this paper proposes a lightweight, adaptive unsupervised concept drift detection method. The method innovatively integrates statistical process control (SPC) with a historical drift feedback mechanism, establishing an incremental sliding-window hypothesis testing framework to efficiently monitor input-output distribution shifts. Compared to existing label-free approaches, it achieves up to a 37% improvement in detection sensitivity at the same false positive rate, while reducing memory and time overhead by an order of magnitude. Its core innovation lies in leveraging sparse drift feedback to dynamically calibrate statistical thresholds, substantially enhancing statistical power—particularly under small-sample and resource-constrained conditions—without compromising robustness or practical deployability.
📝 Abstract
Machine learning models are being increasingly used to automate decisions in almost every domain, and ensuring the performance of these models is crucial for ensuring high quality machine learning enabled services. Ensuring concept drift is detected early is thus of the highest importance. A lot of research on concept drift has focused on the supervised case that assumes the true labels of supervised tasks are available immediately after making predictions. Controlling for false positives while monitoring the performance of predictive models used to make inference from extremely large datasets periodically, where the true labels are not instantly available, becomes extremely challenging. We propose a flexible and efficient concept drift detection algorithm that uses classical statistical process control in a label-less setting to accurately detect concept drifts. We shown empirically that under computational constraints, our approach has better statistical power than previous known methods. Furthermore, we introduce a new drift detection framework to model the scenario of detecting drift (without labels) given prior detections, and show our how our drift detection algorithm can be incorporated effectively into this framework. We demonstrate promising performance via numerical simulations.