Distributionally Robust Feature Selection

📅 2025-10-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses robust feature selection across sensitive subpopulations under high-cost feature acquisition, aiming to construct downstream models that perform well across multiple subgroups given limited observed features. We propose a model-agnostic, backpropagation-free continuous relaxation framework that minimizes the variance of the Bayes-optimal predictor. The method integrates distributionally robust optimization, continuous relaxation in feature space, and noise injection to jointly enhance fairness and generalization. Evaluated on synthetic data and real-world benchmarks (e.g., Adult, COMPAS), our approach selects features that significantly improve both overall downstream model performance and inter-subgroup performance parity—outperforming existing gradient-based and heuristic methods.

Technology Category

Application Category

📝 Abstract
We study the problem of selecting limited features to observe such that models trained on them can perform well simultaneously across multiple subpopulations. This problem has applications in settings where collecting each feature is costly, e.g. requiring adding survey questions or physical sensors, and we must be able to use the selected features to create high-quality downstream models for different populations. Our method frames the problem as a continuous relaxation of traditional variable selection using a noising mechanism, without requiring backpropagation through model training processes. By optimizing over the variance of a Bayes-optimal predictor, we develop a model-agnostic framework that balances overall performance of downstream prediction across populations. We validate our approach through experiments on both synthetic datasets and real-world data.
Problem

Research questions and friction points this paper is trying to address.

Selecting limited costly features for multi-subpopulation model training
Developing model-agnostic feature selection without training backpropagation
Optimizing feature variance to balance cross-population prediction performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Feature selection via continuous relaxation with noising
Optimizing Bayes-optimal predictor variance across populations
Model-agnostic framework balancing multi-population prediction performance
🔎 Similar Papers
No similar papers found.
M
Maitreyi Swaroop
Machine Learning Department, Carnegie Mellon University
T
Tamar Krishnamurti
Division of General Internal Medicine, University of Pittsburgh
Bryan Wilder
Bryan Wilder
Assistant Professor of Machine Learning, Carnegie Mellon University
Artificial intelligenceoptimizationmachine learningsocial networks