Unsupervised Concept Drift Detection from Deep Learning Representations in Real-time

📅 2024-06-24

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

career value

169K/year

🤖 AI Summary

Addressing the challenge of real-time concept drift detection in unlabeled, high-dimensional, large-scale unstructured data (e.g., text, images, audio) under realistic deployment conditions, existing unsupervised methods suffer from low accuracy, high latency, and poor interpretability. This paper proposes DriftLens, the first unsupervised online detection framework leveraging deep representation distribution distances—specifically Maximum Mean Discrepancy (MMD) and Wasserstein distance—integrated with sliding-window statistics and class-level drift decomposition. DriftLens enables millisecond-scale detection, fine-grained root-cause attribution, and quantitative drift intensity estimation. Evaluated on 13 benchmark tasks, it outperforms state-of-the-art methods on 11 metrics; achieves >5× faster inference; attains ≥0.85 correlation between estimated drift intensity and ground-truth distributional shift; and demonstrates strong parameter robustness under dynamic production workloads.

Technology Category

Application Category

📝 Abstract

Concept Drift is a phenomenon in which the underlying data distribution and statistical properties of a target domain change over time, leading to a degradation of the model's performance. Consequently, models deployed in production require continuous monitoring through drift detection techniques. Most drift detection methods to date are supervised, i.e., based on ground-truth labels. However, true labels are usually not available in many real-world scenarios. Although recent efforts have been made to develop unsupervised methods, they often lack the required accuracy, have a complexity that makes real-time implementation in production environments difficult, or are unable to effectively characterize drift. To address these challenges, we propose DriftLens, an unsupervised real-time concept drift detection framework. It works on unstructured data by exploiting the distribution distances of deep learning representations. DriftLens can also provide drift characterization by analyzing each label separately. A comprehensive experimental evaluation is presented with multiple deep learning classifiers for text, image, and speech. Results show that (i) DriftLens performs better than previous methods in detecting drift in $11/13$ use cases; (ii) it runs at least 5 times faster; (iii) its detected drift value is very coherent with the amount of drift (correlation $geq 0.85$); (iv) it is robust to parameter changes.

Problem

Research questions and friction points this paper is trying to address.

Detects concept drift in real-time without supervision

Addresses high computational cost in drift detection

Characterizes and explains drift impact effectively

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unsupervised real-time concept drift detection

Leverages deep learning representation distances

Characterizes drift by label impact analysis

🔎 Similar Papers

No similar papers found.