🤖 AI Summary
This work addresses the challenge of solving generalized linear models with cardinality constraints, where traditional branch-and-bound methods struggle to exploit GPU parallelism due to discrete variables, combinatorial structures, and nonlinear objectives. The paper introduces the first CPU-GPU cooperative branch-and-bound framework that enables efficient GPU batch processing. By incorporating node padding, lightweight custom CUDA kernels, and heterogeneous scheduling, the framework achieves batched parallel evaluation of irregular search nodes. Empirical results demonstrate 10–100× speedups on challenging instances while attaining zero optimality gap. Moreover, the approach uniquely supports exhaustive enumeration of the full Rashomon set, thereby enabling rigorous variable importance analysis and multi-criteria model selection.
📝 Abstract
GPUs have significantly accelerated first-order methods for large-scale optimization, especially in continuous optimization. However, this success has not transferred cleanly to problems with discrete variables, combinatorial structure, and nonlinear objectives, such as certifying optimal solutions for cardinality-constrained generalized linear models. Major challenges include the sequential processing of heterogeneous nodes in branch and bound (BnB) and frequent data movement between the CPU and GPU. We propose a simple, generic, and modular CPU--GPU framework that processes multiple BnB nodes in batches on GPUs. The framework is built around a small set of GPU-efficient routines and uses padding together with lightweight custom kernels to handle irregular node data structures. Experiments show one to two orders of magnitude speedups and zero optimality gap on challenging instances. The framework can also be extended to collect the entire Rashomon set, enabling downstream statistical analysis such as variable-importance analysis and model selection under secondary user-specific measures (e.g., AUC in classification).