🤖 AI Summary
This paper addresses sparse multiple kernel learning (SMKL) for binary support vector machines, aiming to select a sparse convex combination of kernels from a predefined kernel pool. We propose an SMKL formulation with explicit cardinality constraints, integrated with an alternating optimization framework combining optimal response algorithms and a mixed-integer semidefinite programming (MISDP) relaxation scheme: the inner loop solves SVM subproblems using LIBSVM, while the outer loop performs greedy kernel selection followed by simplex projection to enforce sparsity; semidefinite relaxation provides certifiable global optimality guarantees. Evaluated on ten UCI benchmark datasets, our method achieves average accuracy improvements of 3.34% (with random initialization) and 4.05% (with warm-start initialization) over state-of-the-art MKL approaches. It selects fewer kernels, maintains comparable computational efficiency, and significantly enhances model interpretability and robustness.
📝 Abstract
We study Sparse Multiple Kernel Learning (SMKL), which is the problem of selecting a sparse convex combination of prespecified kernels for support vector binary classification. Unlike prevailing l1 regularized approaches that approximate a sparsifying penalty, we formulate the problem by imposing an explicit cardinality constraint on the kernel weights and add an l2 penalty for robustness. We solve the resulting non-convex minimax problem via an alternating best response algorithm with two subproblems: the alpha subproblem is a standard kernel SVM dual solved via LIBSVM, while the beta subproblem admits an efficient solution via the Greedy Selector and Simplex Projector algorithm. We reformulate SMKL as a mixed integer semidefinite optimization problem and derive a hierarchy of semidefinite convex relaxations which can be used to certify near-optimality of the solutions returned by our best response algorithm and also to warm start it. On ten UCI benchmarks, our method with random initialization outperforms state-of-the-art MKL approaches in out-of-sample prediction accuracy on average by 3.34 percentage points (relative to the best performing benchmark) while selecting a small number of candidate kernels in comparable runtime. With warm starting, our method outperforms the best performing benchmark's out-of-sample prediction accuracy on average by 4.05 percentage points. Our convex relaxations provide a certificate that in several cases, the solution returned by our best response algorithm is the globally optimal solution.