π€ AI Summary
This paper addresses two key challenges in Wasserstein distributionally robust learning: (i) representation directions being highly sensitive to perturbations, and (ii) the difficulty of simultaneously preserving invariant structural properties and ensuring feature-level robustness to distributional shifts. To this end, we propose READβa Representation-Aware Robust Estimation framework. Its core contributions are threefold: (i) a multidimensional alignment parameterization that differentially models feature perturbations; (ii) a seminorm regularization theory unifying diverse robust estimators; and (iii) a representation-aware confidence region construction coupled with optimal projection selection, jointly optimizing robustness and geometric fidelity of learned representations. READ integrates Wasserstein distributionally robust optimization, robust profile inference, and geometric projection optimization. Experiments on synthetic and real-world benchmarks demonstrate that READ significantly improves robust estimation accuracy, yields interpretable parameter confidence regions, and enhances cross-distribution knowledge transfer.
π Abstract
We propose REpresentation-Aware Distributionally Robust Estimation (READ), a novel framework for Wasserstein distributionally robust learning that accounts for predictive representations when guarding against distributional shifts. Unlike classical approaches that treat all feature perturbations equally, READ embeds a multidimensional alignment parameter into the transport cost, allowing the model to differentially discourage perturbations along directions associated with informative representations. This yields robustness to feature variation while preserving invariant structure. Our first contribution is a theoretical foundation: we show that seminorm regularizations for linear regression and binary classification arise as Wasserstein distributionally robust objectives, thereby providing tractable reformulations of READ and unifying a broad class of regularized estimators under the DRO lens. Second, we adopt a principled procedure for selecting the Wasserstein radius using the techniques of robust Wasserstein profile inference. This further enables the construction of valid, representation-aware confidence regions for model parameters with distinct geometric features. Finally, we analyze the geometry of READ estimators as the alignment parameters vary and propose an optimization algorithm to estimate the projection of the global optimum onto this solution surface. This procedure selects among equally robust estimators while optimally constructing a representation structure. We conclude by demonstrating the effectiveness of our framework through extensive simulations and a real-world study, providing a powerful robust estimation grounded in learning representation.