🤖 AI Summary
This work addresses robustness and security vulnerabilities of memory-driven autonomous agent recommendation systems (Agent4RSs) in black-box settings, proposing the first stealthy perturbation attack paradigm targeting agent memory mechanisms. Methodologically, we design a “Drunk Agent” strategy to disable memory updates and develop the DrunkAgent framework—a three-module architecture integrating text-trigger generation, memory interference optimization, and agent model distillation—to achieve high transferability and imperceptibility. Our contributions are threefold: (1) identifying critical fragility in the memory evolution phase of Agent4RSs; (2) establishing the first verifiable robustness evaluation benchmark for such systems; and (3) demonstrating up to a 172% increase in target item exposure rate across multiple real-world datasets, thereby validating both attack efficacy and the urgent need for defense mechanisms.
📝 Abstract
Large language model-based agents are increasingly used in recommender systems (Agent4RSs) to achieve personalized behavior modeling. Specifically, Agent4RSs introduces memory mechanisms that enable the agents to autonomously learn and self-evolve from real-world interactions. However, to the best of our knowledge, how robust Agent4RSs are remains unexplored. As such, in this paper, we propose the first work to attack Agent4RSs by perturbing agents' memories, not only to uncover their limitations but also to enhance their security and robustness, ensuring the development of safer and more reliable AI agents. Given the security and privacy concerns, it is more practical to launch attacks under a black-box setting, where the accurate knowledge of the victim models cannot be easily obtained. Moreover, the practical attacks are often stealthy to maximize the impact. To this end, we propose a novel practical attack framework named DrunkAgent. DrunkAgent consists of a generation module, a strategy module, and a surrogate module. The generation module aims to produce effective and coherent adversarial textual triggers, which can be used to achieve attack objectives such as promoting the target items. The strategy module is designed to `get the target agents drunk' so that their memories cannot be effectively updated during the interaction process. As such, the triggers can play the best role. Both of the modules are optimized on the surrogate module to improve the transferability and imperceptibility of the attacks. By identifying and analyzing the vulnerabilities, our work provides critical insights that pave the way for building safer and more resilient Agent4RSs. Extensive experiments across various real-world datasets demonstrate the effectiveness of DrunkAgent.