🤖 AI Summary
Kernel methods face two major bottlenecks in large-scale learning: prohibitive memory overhead and narrow model coverage—primarily limited to kernel ridge regression (KRR), with insufficient support for other critical models such as kernel logistic regression and kernel SVM. This paper proposes the first lightweight, efficient joint optimization framework tailored for *multivariate* kernel models. Its core is the Dual-Variable Block Coordinate Descent with Trust-Region (DBCD-TR) algorithm, which systematically extends lightweight design principles beyond KRR to broader non-KRR kernel learning tasks for the first time. Integrated with Random Fourier Features (RFF) for acceleration, the framework reduces memory consumption by up to 90%, while achieving training speed and accuracy on par with or surpassing state-of-the-art methods. It enables scalable kernel learning on datasets with millions of samples, supporting diverse kernel-based classifiers and regressors within a unified, memory-efficient optimization paradigm.
📝 Abstract
Kernel methods are powerful tools for nonlinear learning with well-established theory. The scalability issue has been their long-standing challenge. Despite the existing success, there are two limitations in large-scale kernel methods: (i) The memory overhead is too high for users to afford; (ii) existing efforts mainly focus on kernel ridge regression (KRR), while other models lack study. In this paper, we propose Joker, a joint optimization framework for diverse kernel models, including KRR, logistic regression, and support vector machines. We design a dual block coordinate descent method with trust region (DBCD-TR) and adopt kernel approximation with randomized features, leading to low memory costs and high efficiency in large-scale learning. Experiments show that Joker saves up to 90% memory but achieves comparable training time and performance (or even better) than the state-of-the-art methods.