π€ AI Summary
This study investigates the trade-off between adversarial robustness and distributional robustness, revealing that adversarial training may inadvertently amplify a modelβs reliance on spurious features, thereby degrading performance on minority subpopulations. By developing a theoretical framework on perturbed data and integrating measures of feature separability with robustness evaluation metrics, the work elucidates how each step of adversarial training influences distributional robustness. The analysis shows that ββ adversarial perturbations can enhance distributional robustness on moderately biased datasets; this benefit persists even under high data skew when the modelβs simplicity bias leads it to focus on core features. These findings underscore the pivotal role of feature separability in mediating the trade-off between the two forms of robustness, cautioning against misjudging robustness outcomes when this factor is overlooked.
π Abstract
Adversarial robustness refers to a model's ability to resist perturbation of inputs, while distribution robustness evaluates the performance of the model under data shifts. Although both aim to ensure reliable performance, prior work has revealed a tradeoff in distribution and adversarial robustness. Specifically, adversarial training might increase reliance on spurious features, which can harm distribution robustness, especially the performance on some underrepresented subgroups. We present a theoretical analysis of adversarial and distribution robustness that provides a tractable surrogate for per-step adversarial training by studying models trained on perturbed data. In addition to the tradeoff, our work further identified a nuanced phenomenon that $\ell_\infty$ perturbations on data with moderate bias can yield an increase in distribution robustness. Moreover, the gain in distribution robustness remains on highly skewed data when simplicity bias induces reliance on the core feature, characterized as greater feature separability. Our theoretical analysis extends the understanding of the tradeoff by highlighting the interplay of the tradeoff and the feature separability. Despite the tradeoff that persists in many cases, overlooking the role of feature separability may lead to misleading conclusions about robustness.