Machine Learning from Explanations

📅 2025-07-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address overfitting, spurious correlations, and lack of interpretable reasoning in few-shot learning—exacerbated by scarce and noisy labels—this paper proposes an explanation-guided two-stage iterative training framework. Methodologically, it pioneers the use of input feature importance explanations (e.g., gradient- or attention-based attributions) as weak supervision signals integrated into model training: Stage I constrains feature attention via explanation signals; Stage II refines explanations using the updated model, with both stages alternating iteratively. This co-adaptive mechanism jointly optimizes prediction accuracy and alignment with human-understandable decision logic. Experiments demonstrate that the approach significantly improves generalization and robustness under few-shot settings, class imbalance, and presence of distracting features; accelerates convergence; and enhances consistency between predictions and post-hoc explanations.

Technology Category

Application Category

📝 Abstract
Acquiring and training on large-scale labeled data can be impractical due to cost constraints. Additionally, the use of small training datasets can result in considerable variability in model outcomes, overfitting, and learning of spurious correlations. A crucial shortcoming of data labels is their lack of any reasoning behind a specific label assignment, causing models to learn any arbitrary classification rule as long as it aligns data with labels. To overcome these issues, we introduce an innovative approach for training reliable classification models on smaller datasets, by using simple explanation signals such as important input features from labeled data. Our method centers around a two-stage training cycle that alternates between enhancing model prediction accuracy and refining its attention to match the explanations. This instructs models to grasp the rationale behind label assignments during their learning phase. We demonstrate that our training cycle expedites the convergence towards more accurate and reliable models, particularly for small, class-imbalanced training data, or data with spurious features.
Problem

Research questions and friction points this paper is trying to address.

High cost and impracticality of large-scale labeled data
Variability and overfitting from small training datasets
Lack of reasoning behind label assignments in models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses explanation signals for training
Two-stage cycle enhances accuracy and attention
Improves convergence for small imbalanced data
🔎 Similar Papers
No similar papers found.
J
Jiashu Tao
School of Computing, National University of Singapore, Singapore, Singapore
Reza Shokri
Reza Shokri
Google; NUS (on leave)
Data PrivacyTrustworthy Machine LearningComputer Security