It's All in the Mix: Wasserstein Classification and Regression with Mixed Features

📅 2023-12-19

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

To address overfitting induced by data scarcity in mixed (continuous + discrete) feature spaces, this paper proposes the first polynomial-time solvable distributionally robust classification and regression framework grounded in the Wasserstein distance. Methodologically, we introduce a novel separation oracle for Wasserstein balls over mixed features—computable in polynomial time—thereby overcoming the exponential complexity barrier inherent in prior approaches that fail to exploit discrete structure. We further design a cutting-plane algorithm, integrated with the ellipsoid method, which guarantees strong convergence while preserving computational efficiency and theoretical soundness. Theoretically, we prove that the resulting optimization problem is polynomial-time solvable. Empirically, our method achieves statistically significant improvements over existing Wasserstein-based methods that ignore discrete feature structure, demonstrating superior robustness and generalization across standard benchmarks.

📝 Abstract

Problem definition: A key challenge in supervised learning is data scarcity, which can cause prediction models to overfit to the training data and perform poorly out of sample. A contemporary approach to combat overfitting is offered by distributionally robust problem formulations that consider all data-generating distributions close to the empirical distribution derived from historical samples, where 'closeness' is determined by the Wasserstein distance. While such formulations show significant promise in prediction tasks where all input features are continuous, they scale exponentially when discrete features are present. Methodology/results: We demonstrate that distributionally robust mixed-feature classification and regression problems can indeed be solved in polynomial time. Our proof relies on classical ellipsoid method-based solution schemes that do not scale well in practice. To overcome this limitation, we develop a practically efficient (yet, in the worst case, exponential time) cutting plane-based algorithm that admits a polynomial time separation oracle, despite the presence of exponentially many constraints. We compare our method against alternative techniques both theoretically and empirically on standard benchmark instances. Managerial implications: Data-driven operations management problems often involve prediction models with discrete features. We develop and analyze distributionally robust prediction models that faithfully account for the presence of discrete features, and we demonstrate that our models can significantly outperform existing methods that are agnostic to the presence of discrete features, both theoretically and on standard benchmark instances.

Problem

Research questions and friction points this paper is trying to address.

Overcoming data scarcity in supervised learning to prevent overfitting

Efficiently solving mixed-feature classification and regression with Wasserstein distance

Improving prediction models with discrete features in operations management

Innovation

Methods, ideas, or system contributions that make the work stand out.

Polynomial time solution for mixed-feature problems

Cutting plane algorithm with separation oracle

Wasserstein distance for distributionally robust models

🔎 Similar Papers

Unsupervised Machine Learning Hybrid Approach Integrating Linear Programming in Loss Function: A Robust Optimization Technique