Joker: Joint Optimization Framework for Lightweight Kernel Machines

📅 2025-05-23

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

Kernel methods face two major bottlenecks in large-scale learning: prohibitive memory overhead and narrow model coverage—primarily limited to kernel ridge regression (KRR), with insufficient support for other critical models such as kernel logistic regression and kernel SVM. This paper proposes the first lightweight, efficient joint optimization framework tailored for *multivariate* kernel models. Its core is the Dual-Variable Block Coordinate Descent with Trust-Region (DBCD-TR) algorithm, which systematically extends lightweight design principles beyond KRR to broader non-KRR kernel learning tasks for the first time. Integrated with Random Fourier Features (RFF) for acceleration, the framework reduces memory consumption by up to 90%, while achieving training speed and accuracy on par with or surpassing state-of-the-art methods. It enables scalable kernel learning on datasets with millions of samples, supporting diverse kernel-based classifiers and regressors within a unified, memory-efficient optimization paradigm.

Technology Category

Application Category

📝 Abstract

Kernel methods are powerful tools for nonlinear learning with well-established theory. The scalability issue has been their long-standing challenge. Despite the existing success, there are two limitations in large-scale kernel methods: (i) The memory overhead is too high for users to afford; (ii) existing efforts mainly focus on kernel ridge regression (KRR), while other models lack study. In this paper, we propose Joker, a joint optimization framework for diverse kernel models, including KRR, logistic regression, and support vector machines. We design a dual block coordinate descent method with trust region (DBCD-TR) and adopt kernel approximation with randomized features, leading to low memory costs and high efficiency in large-scale learning. Experiments show that Joker saves up to 90% memory but achieves comparable training time and performance (or even better) than the state-of-the-art methods.

Problem

Research questions and friction points this paper is trying to address.

Addresses high memory overhead in large-scale kernel methods

Extends optimization framework beyond kernel ridge regression

Improves efficiency and scalability of diverse kernel models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Joint optimization for diverse kernel models

Dual block coordinate descent with trust region

Kernel approximation using randomized features

🔎 Similar Papers

Unsupervised Machine Learning Hybrid Approach Integrating Linear Programming in Loss Function: A Robust Optimization Technique

2024-08-19arXiv.orgCitations: 0

💼 Related Jobs

Machine Learning Engineer