Prediction-Oriented Subsampling from Data Streams

📅 2025-08-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of balancing sampling efficiency and information preservation in offline learning under data stream settings, this paper proposes a prediction-oriented information-theoretic subsampling framework. Unlike conventional approaches that maximize input data entropy, our method guides sampling decisions by minimizing posterior uncertainty of downstream prediction tasks. It incorporates a lightweight model-aware mechanism to ensure sampling stability and computational tractability. Extensive experiments on time-series forecasting and anomaly detection demonstrate that the proposed method significantly outperforms existing information-theoretic baselines: it achieves an average 12.7% reduction in prediction error at equivalent sampling rates, while maintaining scalable computational overhead. The core contribution lies in the first explicit formulation of predictive uncertainty as a principled subsampling criterion—unifying theoretical interpretability with practical performance.

Technology Category

Application Category

📝 Abstract
Data is often generated in streams, with new observations arriving over time. A key challenge for learning models from data streams is capturing relevant information while keeping computational costs manageable. We explore intelligent data subsampling for offline learning, and argue for an information-theoretic method centred on reducing uncertainty in downstream predictions of interest. Empirically, we demonstrate that this prediction-oriented approach performs better than a previously proposed information-theoretic technique on two widely studied problems. At the same time, we highlight that reliably achieving strong performance in practice requires careful model design.
Problem

Research questions and friction points this paper is trying to address.

Efficient subsampling from continuous data streams
Reducing uncertainty in prediction-oriented models
Balancing computational costs with information retention
Innovation

Methods, ideas, or system contributions that make the work stand out.

Intelligent data subsampling for offline learning
Information-theoretic method reduces prediction uncertainty
Careful model design ensures strong performance
🔎 Similar Papers
No similar papers found.