🤖 AI Summary
To address the degradation of generalization performance and large train-test performance gaps caused by overfitting in machine learning models, this paper proposes an attribution-driven data augmentation method. Leveraging interpretability techniques—particularly Layer-wise Relevance Propagation (LRP)—the approach identifies input regions most contributory to model predictions and applies targeted, relevance-guided masking during augmentation, replacing conventional random occlusion or Dropout. This work establishes the first explicit coupling between attribution-based explanation and data augmentation, realizing “explanation-as-augmentation.” Extensive experiments across multiple benchmark datasets demonstrate that the proposed method significantly enhances model robustness against occlusion attacks, broadens feature utilization, and improves inference generalization, thereby effectively narrowing the train-test performance gap.
📝 Abstract
Overfitting is a well-known issue extending even to state-of-the-art (SOTA) Machine Learning (ML) models, resulting in reduced generalization, and a significant train-test performance gap. Mitigation measures include a combination of dropout, data augmentation, weight decay, and other regularization techniques. Among the various data augmentation strategies, occlusion is a prominent technique that typically focuses on randomly masking regions of the input during training. Most of the existing literature emphasizes randomness in selecting and modifying the input features instead of regions that strongly influence model decisions. We propose Relevance-driven Input Dropout (RelDrop), a novel data augmentation method which selectively occludes the most relevant regions of the input, nudging the model to use other important features in the prediction process, thus improving model generalization through informed regularization. We further conduct qualitative and quantitative analyses to study how Relevance-driven Input Dropout (RelDrop) affects model decision-making. Through a series of experiments on benchmark datasets, we demonstrate that our approach improves robustness towards occlusion, results in models utilizing more features within the region of interest, and boosts inference time generalization performance. Our code is available at https://github.com/Shreyas-Gururaj/LRP_Relevance_Dropout.