🤖 AI Summary
This work addresses the high computational cost of Mahalanobis projection and sensitivity to suboptimal action feedback in contextual recommendation. To circumvent these issues, the authors propose a novel projection-free online learning algorithm grounded in a non-realizable learning framework. By designing a second-order perceptron–style update rule that operates naturally in a reproducing kernel Hilbert space, the method inherently avoids the projection step required by traditional Online Newton Step algorithms. Theoretical analysis shows that the algorithm achieves the optimal $O(d \log T)$ regret bound while substantially reducing computational complexity. Empirical results further demonstrate its built-in robustness to suboptimal feedback, eliminating the need for parallel multi-learning-rate mechanisms commonly employed in prior approaches.
📝 Abstract
Contextual recommendation is a variant of contextual linear bandits in which the learner observes an (optimal) action rather than a reward scalar. Recently, Sakaue et al. (2025) developed an efficient Online Newton Step (ONS) approach with an $O(d\log T)$ regret bound, where $d$ is the dimension of the action space and $T$ is the time horizon. In this paper, we present a simple algorithm that is more efficient than the ONS-based method while achieving the same regret guarantee. Our core idea is to exploit the improperness inherent in contextual recommendation, leading to an update rule akin to the second-order perceptron from online classification. This removes the Mahalanobis projection step required by ONS, which is often a major computational bottleneck. More importantly, the same algorithm remains robust to possibly suboptimal action feedback, whereas the prior ONS-based method required running multiple ONS learners with different learning rates for this extension. We describe how our method works in general Hilbert spaces (e.g., via kernelization), where eliminating Mahalanobis projections becomes even more beneficial.