Fingerprinting Concepts in Data Streams with Supervised and Unsupervised Meta-Information

📅 2026-03-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of concept drift in data streams, which often degrades model performance, by proposing FiCSUM—a novel framework that effectively distinguishes between emerging and recurring concepts. FiCSUM is the first to integrate supervised and unsupervised multidimensional meta-information features to construct highly discriminative “concept fingerprints.” It further incorporates a dynamic weighting mechanism that adaptively identifies concept changes. The approach operates through meta-feature extraction, dynamic weighting, fingerprint vector construction, and similarity-based detection, significantly enhancing the ability to detect both new and reappearing concepts. Extensive experiments on 11 real-world and synthetic datasets demonstrate that FiCSUM consistently outperforms state-of-the-art methods in both detection accuracy and concept modeling fidelity.

Technology Category

Application Category

📝 Abstract
Streaming sources of data are becoming more common as the ability to collect data in real-time grows. A major concern in dealing with data streams is concept drift, a change in the distribution of data over time, for example, due to changes in environmental conditions. Representing concepts (stationary periods featuring similar behaviour) is a key idea in adapting to concept drift. By testing the similarity of a concept representation to a window of observations, we can detect concept drift to a new or previously seen recurring concept. Concept representations are constructed using meta-information features, values describing aspects of concept behaviour. We find that previously proposed concept representations rely on small numbers of meta-information features. These representations often cannot distinguish concepts, leaving systems vulnerable to concept drift. We propose FiCSUM, a general framework to represent both supervised and unsupervised behaviours of a concept in a fingerprint, a vector of many distinct meta-information features able to uniquely identify more concepts. Our dynamic weighting strategy learns which meta-information features describe concept drift in a given dataset, allowing a diverse set of meta-information features to be used at once. FiCSUM outperforms state-of-the-art methods over a range of 11 real world and synthetic datasets in both accuracy and modeling underlying concept drift.
Problem

Research questions and friction points this paper is trying to address.

concept drift
data streams
concept representation
meta-information
fingerprinting
Innovation

Methods, ideas, or system contributions that make the work stand out.

concept drift
data streams
meta-information
fingerprinting
dynamic weighting
🔎 Similar Papers
No similar papers found.