🤖 AI Summary
Data contamination and distributional shifts in domains such as energy systems degrade model robustness.
Method: This paper proposes an interference-robust shallow convex neural network framework. It rigorously reformulates training of shallow ReLU networks into a tractable convex Wasserstein distributionally robust optimization (DRO) problem—the first such exact convexification. The framework supports hard physical constraint embedding and enables posterior stability verification via mixed-integer convex programming. It integrates convex reconstruction, Wasserstein DRO, and open-source optimization solvers.
Results: Evaluated on real-world tasks—including building energy consumption forecasting in virtual power plants—the method significantly improves generalization and robustness under distributional perturbations. It provides theoretical performance guarantees (e.g., out-of-distribution risk bounds) and demonstrates industrial-scale scalability.
📝 Abstract
In this work, we propose Wasserstein distributionally robust shallow convex neural networks (WaDiRo-SCNNs) to provide reliable nonlinear predictions when subject to adverse and corrupted datasets. Our approach is based on a new convex training program for $ReLU$-based shallow neural networks which allows us to cast the problem as an exact, tractable reformulation of its order-1 Wasserstein distributionally robust counterpart. Our training procedure is conservative, has low stochasticity, is solvable with open-source solvers, and is scalable to large industrial deployments. We provide out-of-sample performance guarantees, show that hard convex physical constraints can be enforced in the training program, and propose a mixed-integer convex post-training verification program to evaluate model stability. WaDiRo-SCNN aims to make neural networks safer for critical applications, such as in the energy sector. Finally, we numerically demonstrate the performance of our model on a synthetic experiment, a real-world power system application, i.e., the prediction of non-residential buildings' hourly energy consumption in the context of virtual power plants, and on benchmark datasets. The experimental results are convincing and showcase the strengths of the proposed model.