DRIP: DRop unImportant data Points -- Enhancing Machine Learning Efficiency with Grad-CAM-Based Real-Time Data Prioritization for On-Device Training

📅 2025-04-11

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

To address the trade-off between excessive storage overhead and performance degradation in edge-device model training under resource constraints, this paper proposes DRIP—a real-time data importance-aware online training framework for embedded systems. Methodologically, DRIP is the first to enable dynamic, on-the-fly retention or discarding of data points without requiring full-dataset access; it introduces a lightweight, Grad-CAM–driven DRIP Score for real-time importance estimation directly on-device; and integrates embedded-aware optimization algorithms with streaming-data filtering. Evaluated on four benchmark datasets, DRIP reduces average storage consumption by 39% while maintaining or surpassing the accuracy of full-dataset training. Its core contributions are: (1) the first end-to-edge real-time data selection paradigm, and (2) an open-source, deployable lightweight importance scoring mechanism tailored for embedded environments.

Technology Category

Application Category

📝 Abstract

Selecting data points for model training is critical in machine learning. Effective selection methods can reduce the labeling effort, optimize on-device training for embedded systems with limited data storage, and enhance the model performance. This paper introduces a novel algorithm that uses Grad-CAM to make online decisions about retaining or discarding data points. Optimized for embedded devices, the algorithm computes a unique DRIP Score to quantify the importance of each data point. This enables dynamic decision-making on whether a data point should be stored for potential retraining or discarded without compromising model performance. Experimental evaluations on four benchmark datasets demonstrate that our approach can match or even surpass the accuracy of models trained on the entire dataset, all while achieving storage savings of up to 39%. To our knowledge, this is the first algorithm that makes online decisions about data point retention without requiring access to the entire dataset.

Problem

Research questions and friction points this paper is trying to address.

Prioritize data points for efficient on-device training

Reduce storage needs without losing model accuracy

Enable real-time data retention decisions using Grad-CAM

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Grad-CAM for real-time data prioritization

Computes DRIP Score for dynamic data retention

Optimizes on-device training with storage savings

🔎 Similar Papers

Multiple importance sampling for stochastic gradient estimation