🤖 AI Summary
This work addresses the challenge of concept drift in data streams, which often degrades model performance, by proposing FiCSUM—a novel framework that effectively distinguishes between emerging and recurring concepts. FiCSUM is the first to integrate supervised and unsupervised multidimensional meta-information features to construct highly discriminative “concept fingerprints.” It further incorporates a dynamic weighting mechanism that adaptively identifies concept changes. The approach operates through meta-feature extraction, dynamic weighting, fingerprint vector construction, and similarity-based detection, significantly enhancing the ability to detect both new and reappearing concepts. Extensive experiments on 11 real-world and synthetic datasets demonstrate that FiCSUM consistently outperforms state-of-the-art methods in both detection accuracy and concept modeling fidelity.
📝 Abstract
Streaming sources of data are becoming more common as the ability to collect data in real-time grows. A major concern in dealing with data streams is concept drift, a change in the distribution of data over time, for example, due to changes in environmental conditions. Representing concepts (stationary periods featuring similar behaviour) is a key idea in adapting to concept drift. By testing the similarity of a concept representation to a window of observations, we can detect concept drift to a new or previously seen recurring concept. Concept representations are constructed using meta-information features, values describing aspects of concept behaviour. We find that previously proposed concept representations rely on small numbers of meta-information features. These representations often cannot distinguish concepts, leaving systems vulnerable to concept drift. We propose FiCSUM, a general framework to represent both supervised and unsupervised behaviours of a concept in a fingerprint, a vector of many distinct meta-information features able to uniquely identify more concepts. Our dynamic weighting strategy learns which meta-information features describe concept drift in a given dataset, allowing a diverse set of meta-information features to be used at once. FiCSUM outperforms state-of-the-art methods over a range of 11 real world and synthetic datasets in both accuracy and modeling underlying concept drift.