🤖 AI Summary
To address the degradation of subgroup robustness caused by compositional shifts—where training data lacks coverage of all attribute combinations—this paper proposes CoInD, a novel synthetic data generation framework. CoInD explicitly incorporates compositional structures from world knowledge into synthesis, enforcing conditional independence via Fisher divergence regularization. It integrates conditional diffusion models, joint–marginal distribution alignment, and compositional causal modeling to generate high-fidelity synthetic data that fully covers all attribute combinations. This enables improved worst-group generalization under out-of-distribution compositional shifts. Evaluated on the CelebA compositional shift benchmark, CoInD achieves state-of-the-art worst-group accuracy while yielding synthetically generated samples with superior fidelity. The method effectively mitigates failure modes in generalizing to unseen attribute combinations, thereby enhancing subgroup robustness without requiring additional real-world annotations or retraining of downstream models.
📝 Abstract
Machine learning systems struggle with robustness, under subpopulation shifts. This problem becomes especially pronounced in scenarios where only a subset of attribute combinations is observed during training -a severe form of subpopulation shift, referred as compositional shift. To address this problem, we ask the following question: Can we improve the robustness by training on synthetic data, spanning all possible attribute combinations? We first show that training of conditional diffusion models on limited data lead to incorrect underlying distribution. Therefore, synthetic data sampled from such models will result in unfaithful samples and does not lead to improve performance of downstream machine learning systems. To address this problem, we propose CoInD to reflect the compositional nature of the world by enforcing conditional independence through minimizing Fisher's divergence between joint and marginal distributions. We demonstrate that synthetic data generated by CoInD is faithful and this translates to state-of-the-art worst-group accuracy on compositional shift tasks on CelebA.