Learning from Synthetic Data via Provenance-Based Input Gradient Guidance

📅 2026-04-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the vulnerability of existing synthetic-data-based learning methods to synthesis-induced biases and artifacts, which often lead models to rely on spurious correlations from non-target regions. To mitigate this issue, the authors propose a novel approach that leverages provenance information inherent in the synthetic data generation process as a supervisory signal. By decomposing input gradients and explicitly suppressing those originating from non-target regions, the method steers the model to focus on authentic discriminative features. Notably, this technique requires no additional annotations and demonstrates consistent performance gains across multiple tasks—including weakly supervised object localization, spatio-temporal action localization, and image classification—thereby validating its effectiveness and broad applicability.
📝 Abstract
Learning methods using synthetic data have attracted attention as an effective approach for increasing the diversity of training data while reducing collection costs, thereby improving the robustness of model discrimination. However, many existing methods improve robustness only indirectly through the diversification of training samples and do not explicitly teach the model which regions in the input space truly contribute to discrimination; consequently, the model may learn spurious correlations caused by synthesis biases and artifacts. Motivated by this limitation, this paper proposes a learning framework that uses provenance information obtained during the training data synthesis process, indicating whether each region in the input space originates from the target object, as an auxiliary supervisory signal to promote the acquisition of representations focused on target regions. Specifically, input gradients are decomposed based on information about target and non-target regions during synthesis, and input gradient guidance is introduced to suppress gradients over non-target regions. This suppresses the model's reliance on non-target regions and directly promotes the learning of discriminative representations for target regions. Experiments demonstrate the effectiveness and generality of the proposed method across multiple tasks and modalities, including weakly supervised object localization, spatio-temporal action localization, and image classification.
Problem

Research questions and friction points this paper is trying to address.

synthetic data
spurious correlations
input gradient
provenance information
discriminative representations
Innovation

Methods, ideas, or system contributions that make the work stand out.

synthetic data
provenance-based guidance
input gradient decomposition
discriminative representation learning
spurious correlation suppression
🔎 Similar Papers
No similar papers found.