🤖 AI Summary
Stream learning faces the challenge of balancing computational efficiency and model accuracy, particularly under dynamic data distributions and concept drift. This paper proposes a learnable fingerprint-based data selection mechanism that formulates core-set construction as a dynamic parametrization process, enabling online adaptive buffer updates. Our contributions are threefold: (i) the first end-to-end learnable fingerprint embedding representation for streaming data; (ii) a fingerprint-guided coreset optimization framework integrated with gradient-aware sample selection; and (iii) real-time buffer management supporting multi-rate data streams. Extensive experiments on standard stream learning benchmarks demonstrate that our method achieves 15.99%–51.24% higher classification accuracy and 4.6× greater training throughput compared to state-of-the-art approaches, while maintaining low memory overhead and robustness to concept drift.
📝 Abstract
Stream Learning (SL) requires models that can quickly adapt to continuously evolving data, posing significant challenges in both computational efficiency and learning accuracy. Effective data selection is critical in SL to ensure a balance between information retention and training efficiency. Traditional rule-based data selection methods struggle to accommodate the dynamic nature of streaming data, highlighting the necessity for innovative solutions that effectively address these challenges. Recent approaches to handling changing data distributions face challenges that limit their effectiveness in fast-paced environments. In response, we propose StreamFP, a novel approach that uniquely employs dynamic, learnable parameters called fingerprints to enhance data selection efficiency and adaptability in stream learning. StreamFP optimizes coreset selection through its unique fingerprint-guided mechanism for efficient training while ensuring robust buffer updates that adaptively respond to data dynamics, setting it apart from existing methods in stream learning. Experimental results demonstrate that StreamFP outperforms state-of-the-art methods by achieving accuracy improvements of 15.99%, 29.65%, and 51.24% compared to baseline models across varying data arrival rates, alongside a training throughput increase of 4.6x.