Unsupervised Concept Drift Detection from Deep Learning Representations in Real-time

📅 2024-06-24
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the challenge of real-time concept drift detection in unlabeled, high-dimensional, large-scale unstructured data (e.g., text, images, audio) under realistic deployment conditions, existing unsupervised methods suffer from low accuracy, high latency, and poor interpretability. This paper proposes DriftLens, the first unsupervised online detection framework leveraging deep representation distribution distances—specifically Maximum Mean Discrepancy (MMD) and Wasserstein distance—integrated with sliding-window statistics and class-level drift decomposition. DriftLens enables millisecond-scale detection, fine-grained root-cause attribution, and quantitative drift intensity estimation. Evaluated on 13 benchmark tasks, it outperforms state-of-the-art methods on 11 metrics; achieves >5× faster inference; attains ≥0.85 correlation between estimated drift intensity and ground-truth distributional shift; and demonstrates strong parameter robustness under dynamic production workloads.

Technology Category

Application Category

📝 Abstract
Concept Drift is a phenomenon in which the underlying data distribution and statistical properties of a target domain change over time, leading to a degradation of the model's performance. Consequently, models deployed in production require continuous monitoring through drift detection techniques. Most drift detection methods to date are supervised, i.e., based on ground-truth labels. However, true labels are usually not available in many real-world scenarios. Although recent efforts have been made to develop unsupervised methods, they often lack the required accuracy, have a complexity that makes real-time implementation in production environments difficult, or are unable to effectively characterize drift. To address these challenges, we propose DriftLens, an unsupervised real-time concept drift detection framework. It works on unstructured data by exploiting the distribution distances of deep learning representations. DriftLens can also provide drift characterization by analyzing each label separately. A comprehensive experimental evaluation is presented with multiple deep learning classifiers for text, image, and speech. Results show that (i) DriftLens performs better than previous methods in detecting drift in $11/13$ use cases; (ii) it runs at least 5 times faster; (iii) its detected drift value is very coherent with the amount of drift (correlation $geq 0.85$); (iv) it is robust to parameter changes.
Problem

Research questions and friction points this paper is trying to address.

Detects concept drift in real-time without supervision
Addresses high computational cost in drift detection
Characterizes and explains drift impact effectively
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unsupervised real-time concept drift detection
Leverages deep learning representation distances
Characterizes drift by label impact analysis
🔎 Similar Papers
No similar papers found.
S
Salvatore Greco
Politecnico di Torino, Turin, Italy
B
Bartolomeo Vacchetti
Politecnico di Torino, Turin, Italy
D
D. Apiletti
Politecnico di Torino, Turin, Italy
Tania Cerquitelli
Tania Cerquitelli
Full Professor, Dept. of Control and Computer Engineering,Politecnico di Torino
Automated Data ScienceExplainable AIMachine learningData managementBig data analytics