🤖 AI Summary
This work exposes a fundamental vulnerability of online learning to rank (OLTR) algorithms under collaborative adversarial attacks: an attacker can stealthily manipulate ranking outcomes—ensuring a target item remains persistently in the top-K positions—without observing any user feedback. To this end, we propose the first “no-observation” attack paradigm, requiring only O(log T) interventions to induce linear regret and achieve T − o(T) rounds of sustained exposure. Leveraging the cascade click model, we design two tailored attack strategies—CascadeOFA and PBMOFA—targeting CascadeUCB1 and PBM-UCB, respectively. We provide rigorous theoretical guarantees of their efficacy and empirically validate them on real-world datasets, demonstrating that minimal manipulations drastically degrade recommendation quality. This is the first systematic study revealing critical security risks of OLTR in black-box, feedback-free settings, offering both a foundational warning and a benchmark for designing robust ranking algorithms.
📝 Abstract
Online learning to rank (OLTR) plays a critical role in information retrieval and machine learning systems, with a wide range of applications in search engines and content recommenders. However, despite their extensive adoption, the susceptibility of OLTR algorithms to coordinated adversarial attacks remains poorly understood. In this work, we present a novel framework for attacking some of the widely used OLTR algorithms. Our framework is designed to promote a set of target items so that they appear in the list of top-K recommendations for T - o(T) rounds, while simultaneously inducing linear regret in the learning algorithm. We propose two novel attack strategies: CascadeOFA for CascadeUCB1 and PBMOFA for PBM-UCB . We provide theoretical guarantees showing that both strategies require only O(log T) manipulations to succeed. Additionally, we supplement our theoretical analysis with empirical results on real-world data.