🤖 AI Summary
To address concept drift induced by non-stationary wireless environments in 6G AI-native networks, this paper proposes two label-free, model-agnostic batch-based concept drift detection methods. Innovatively, an expected utility scoring mechanism is introduced to quantify model performance degradation without ground-truth labels and autonomously trigger retraining—enabling continuous monitoring for typical scenarios such as device localization and link anomaly detection. The methods integrate unsupervised learning, batch processing, and utility-driven modeling, ensuring strong generalizability and deployment robustness. Evaluated on real-world wireless datasets, the proposed approaches achieve F1-scores of 0.94–1.00—outperforming classical detectors (ADWIN, DDM, CUSUM) by 20–40 percentage points—and reduce false positive rates by 20 percentage points. These results significantly enhance the trustworthiness and operational reliability of AI models in dynamic 6G environments.
📝 Abstract
AI-native 6G networks promise unprecedented automation and performance by embedding machine-learning models throughout the radio access and core segments of the network. However, the non-stationary nature of wireless environments due to infrastructure changes, user mobility, and emerging traffic patterns, induces concept drifts that can quickly degrade these model accuracies. Existing methods in general are very domain specific, or struggle with certain type of concept drift. In this paper, we introduce two unsupervised, model-agnostic, batch concept drift detectors. Both methods compute an expected-utility score to decide when concept drift occurred and if model retraining is warranted, without requiring ground-truth labels after deployment. We validate our framework on two real-world wireless use cases in outdoor fingerprinting for localization and for link-anomaly detection, and demonstrate that both methods are outperforming classical detectors such as ADWIN, DDM, CUSUM by 20-40 percentage points. Additionally, they achieve an F1-score of 0.94 and 1.00 in correctly triggering retraining alarm, thus reducing the false alarm rate by up to 20 percentage points compared to the best classical detectors.