🤖 AI Summary
This work addresses the challenge of model generalization under distribution shifts when multi-environment labeled data are unavailable. Building upon the anti-causal assumption—that outcomes induce covariates—it proposes a novel label-free domain generalization method. By regularizing the model’s sensitivity to perturbations in the mean and covariance of covariates across environments, the approach leverages unlabeled multi-environment data to estimate directions of environmental change and enjoys worst-case optimality guarantees. As the first framework to achieve domain generalization using unlabeled data under an anti-causal setting, it demonstrates significant performance gains over existing methods on both controlled physical systems and physiological signal datasets, confirming its effectiveness and robustness.
📝 Abstract
The problem of domain generalization concerns learning predictive models that are robust to distribution shifts when deployed in new, previously unseen environments. Existing methods typically require labeled data from multiple training environments, limiting their applicability when labeled data are scarce. In this work, we study domain generalization in an anti-causal setting, where the outcome causes the observed covariates. Under this structure, environment perturbations that affect the covariates do not propagate to the outcome, which motivates regularizing the model's sensitivity to these perturbations. Crucially, estimating these perturbation directions does not require labels, enabling us to leverage unlabeled data from multiple environments. We propose two methods that penalize the model's sensitivity to variations in the mean and covariance of the covariates across environments, respectively, and prove that these methods have worst-case optimality guarantees under certain classes of environments. Finally, we demonstrate the empirical performance of our approach on a controlled physical system and a physiological signal dataset.