π€ AI Summary
To address the high computational cost and extensive cross-validation dependency in selecting Gaussian kernel parameters for Support Vector Classification (SVC), this paper proposes MaxMin-L2-SVC-NCHβa novel framework that jointly models classifier training and kernel parameter optimization as a bilevel minimax problem: the inner level minimizes the distance from the origin to the normalized convex hull (NCH) of L2-SVC dual variables, while the outer level maximizes the Gaussian kernel bandwidth. We design a Projection Gradient Ascent (PGA) algorithm that extends Sequential Minimal Optimization (SMO) to jointly update both kernel parameters and support vectors, incorporating a dynamic learning rate to enhance convergence stability. Evaluated on multiple benchmark datasets, our method reduces the number of required model trainings by over 90% compared to grid-search cross-validation, while achieving test accuracy comparable to optimally tuned baselines. This significantly improves hyperparameter tuning efficiency and computational scalability.
π Abstract
The selection of Gaussian kernel parameters plays an important role in the applications of support vector classification (SVC). A commonly used method is the k-fold cross validation with grid search (CV), which is extremely time-consuming because it needs to train a large number of SVC models. In this paper, a new approach is proposed to train SVC and optimize the selection of Gaussian kernel parameters. We first formulate the training and parameter selection of SVC as a minimax optimization problem named as MaxMin-L2-SVC-NCH, in which the minimization problem is an optimization problem of finding the closest points between two normal convex hulls (L2-SVC-NCH) while the maximization problem is an optimization problem of finding the optimal Gaussian kernel parameters. A lower time complexity can be expected in MaxMin-L2-SVC-NCH because CV is not needed. We then propose a projected gradient algorithm (PGA) for training L2-SVC-NCH. The famous sequential minimal optimization (SMO) algorithm is a special case of the PGA. Thus, the PGA can provide more flexibility than the SMO. Furthermore, the solution of the maximization problem is done by a gradient ascent algorithm with dynamic learning rate. The comparative experiments between MaxMin-L2-SVC-NCH and the previous best approaches on public datasets show that MaxMin-L2-SVC-NCH greatly reduces the number of models to be trained while maintaining competitive test accuracy. These findings indicate that MaxMin-L2-SVC-NCH is a better choice for SVC tasks.