π€ AI Summary
This work proposes a novel data augmentation method that integrates fine-grained causal information, thereby transcending the limitations imposed by traditional Markov equivalence classes. Grounded in the principle of independent causal mechanisms and additive noise models, the approach generates new samples consistent with the true underlying causal structure by resampling residuals from marginal distribution models. Theoretical analysis demonstrates that, under Gaussian linear assumptions, the augmented data preserve causal consistency. Empirical evaluations further show that predictive models trained on the augmented data achieve significantly improved accuracy. To the best of our knowledge, this study is the first to combine residual-based bootstrapping with causal structure learning, offering a principled causal perspective for data augmentation.
π Abstract
Data augmentation integrates domain knowledge into a dataset by making domain-informed modifications to existing data points. For example, image data can be augmented by duplicating images in different tints or orientations, thereby incorporating the knowledge that images may vary in these dimensions. Recent work by Teshima and Sugiyama has explored the integration of causal knowledge (e.g, A causes B causes C) up to conditional independence equivalence. We suggest a related approach for settings with additive noise that can incorporate information beyond a Markov equivalence class. The approach, built on the principle of independent mechanisms, permutes the residuals of models built on marginal probability distributions. Predictive models built on our augmented data demonstrate improved accuracy, for which we provide theoretical backing in linear Gaussian settings.